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I.     INTRODUCTION 

The  use  of  fiber  reinforced  composite  materials  in  load  bearing 
structures  has  many  advantages  over  conventional  materials  such  as  steel  and 
aluminum.  The  high  stiffness  to  weight  ratio,  strength  to  weight  ratio,  and 
resistance  to  corrosion  are  among  the  more  commonly  known  advantages. 
Perhaps  the  most  significant,  but  not  widely  known  benefit  of  using 
composites  is  the  redundancy  intrinsic  in  the  material.  The  redundancy 
results  from  the  load  being  shared  among  a  multitude  of  fibers  imbedded  in 
a  binding  matrix  material.  However,  in  order  to  realize  the  benefit  of 
increased  redundancy,  it  must  be  quantified  in  terms  of  reliability;  i.e.  the 
mathematical  model  and  the  parameters  specific  to  the  composite  require 
determination. 

As  with  any  structural  design,  it  is  imperative  that  reliability 
considerations  be  incorporated  in  the  engineering  design  of  a  composite. 
Depending  on  the  application,  this  may  take  the  form  of  either  functional 
reliability  or  human-safe  reliability.  Functional  reliability  refers  to  the  level 
of  assurance  that  the  system  under  consideration  will  function  within  the 


specified  design  parameters  over  the  design  life.  If  the  life  or  injury  of  a 
human  being  is  at  stake,  the  issue  becomes  one  of  human-safe  reliability. 
In  either  case,  a  description  is  required  of  how  the  material  used  in  the 
design  will  behave  under  a  given  stress  over  time.  This  description  is  not 
possible  without  empirical  data  of  the  material  in  both  strength  and  life. 

Strength  and  life  data  for  conventional  materials  can  be  located  in  many 
material  and  design  reference  books.  The  reason  for  this  is  that  those 
materials  have  been  in  use  for  so  many  years  that  the  accumulated  data  is 
sufficient  to  adequately  describe  the  relevant  failure  processes  with  a 
parametric  model.  This  is  not  the  case  in  the  use  of  the  composite  materials, 
for  numerous  reasons.  Composites  have  a  relatively  short  history  of  use,  and 
when  utilized,  the  applications  are  typically  considered  high  technology. 
High  technology  applications  normally  do  not  produce  large  numbers  of 
sample  sets.  Being  on  the  cutting  edge  means  limited  lead  time,  which 
further  complicates  the  problem  of  reliability  characterization.  This  limited 
base  of  failure  data  requires  that  experiments  be  performed  in  order  to  make 
it  possible  to  understand  how  the  composite  behaves  in  time  under  load  and 
to  estimate  the  associated  parameters  of  the  relevant  probability  model. 


Experiments,  however,  can  be  extremely  costly  in  terms  of  both 
resources  and  time.  In  particular,  life  experiments  present  a  formidable 
problem  because  of  the  large  variability  in  life  data.  The  large  variation  in 
the  life  data  means  that  a  substantial  amount  of  time  will  pass,  possibly 
hundreds  of  years,  before  all  samples  on  test  fail.  Surely  the  designer  is  not 
going  to  wait  to  actually  observe  even  the  mean  life  of  such  a  material  in  a 
life  test.  The  solution  to  this  problem,  and  the  objective  of  this  study,  is  to 
design  the  experiment  to  maximize  the  amount  of  information  obtained  under 
constrained  resources  and  time.  In  order  to  achieve  this  objective,  simulation 
is  required  because  reliability  is  probabilistic,  involving  many  combinations 
of  the  random  variables.  The  problem,  therefore,  cannot  be  directly  cast  in 
terms  of  classical  optimization  techniques.  An  objective  function  cannot  be 
specified  to  be  minimized  because  the  functions  and  the  parameters  of  the 
function  are  not  deterministic. 

The  process  simulated  in  this  work  is  the  random  nature  in  which  actual 
materials  fail  in  both  load  exceedance  (strength)  and  time  exceedance  (life). 
The  random  sets  of  computer  generated  failure  data  are  used  to  simulate  the 
kinds  of  results  an  actual  experiment  might  produce.  The  intended  methods 
of  data  analysis,  such  as  non-parametric  and  parametric  methods,  are  then 


applied  to  the  simulated  data  to  determine  if  they  will  be  adequate  to 
produce  the  desired  level  of  confidence  in  quantifying  the  material  reliability. 
In  the  parametric  approach,  a  model  is  selected  based  on  the  physical  failure 
process  of  a  sample  in  strength  or  life.  For  the  known  model,  the  parameters 
of  the  model  require  determination.  Since  the  parameters  themselves  are 
probabilistic  and  will  never  be  known  precisely,  they  can  only  be  estimated. 
Simulation  is  used  to  determine  the  impact  on  the  statistics  of  the  estimated 
parameters  when  a  proposed  experiment  procedure  is  executed. 

The  criterion  which  will  be  used  to  evaluate  the  knowledge  gain 
resulting  from  one  method  of  performing  an  experiment  compared  to  another 
is  the  change  in  the  information  of  the  parameters.  The  information  of  a 
parameter  is  quantified  through  the  application  of  information  theory. 
Information  will  be  shown  to  be  a  scalar  value,  which  can  be  used  as  an 
optimality  condition  in  the  design  of  the  experiment.  In  an  optimization 
sense,  the  design  variable  is  the  choice  of  when  to  censor  an  experiment 
prior  to  the  completion  of  all  the  tests  to  allow  the  multiple  use  of  limited 
equipment  in  order  to  maximize  the  objective  function  of  information. 

The  methods  of  optimizing  information  in  an  experiment  developed  in 
this  study  through  the  use  of  simulation  are  being  applied  to  a  graphite  fiber 


life  experiment  in  progress  at  the  Mechanics  of  Materials  for  Composite 
Reliability  Laboratory  of  the  Naval  Postgraduate  School,  Monterey  CA. 


II.  BACKGROUND 

The  purpose  of  this  chapter  is  to  provide  the  background  information 
which  motivated  this  investigation.  This  chapter  is  divided  into  three 
sections:  methodologies  of  design  life  in  engineering,  the  composite  failure 
mechanism,  and  information  theory.  The  first  section  is  a  contrast  of  the 
factor-of-safety  and  reliability  approaches  used  in  engineering  design  to 
determine  the  design  life  of  a  component  or  system.  The  second  section  is 
a  description  of  the  failure  mechanisms  of  the  composite  in  strength  or  life. 
The  final  topic  is  a  discussion  of  information  theory  and  how  this  concept 
might  be  implemented  in  the  design  of  an  experiment. 

A.     METHODOLOGIES  OF   DESIGN  LIFE  IN  ENGINEERING 

There  are  two  different  approaches  used  in  mechanical  engineering 
design  to  determine  the  design  stresses  and/or  life  of  a  component  or  system. 
They  are  known  as  the  deterministic  or  factor-of-safety  approach  and  the 
reliability  approach  [Shigley  and  Mischke,!]. 


1.      The  Factor-of-Safety  Approach 

This  approach  is  based  on  the  assumptions  that  the  applied  stresses, 
strength,  and  life  of  a  specimen  are  deterministic.  Since  a  probabilistic 
model  is  not  needed,  this  method  is  simple  to  use  in  the  design  process  and 
is  popular  for  that  reason.  It  is  applicable  for  materials  with  extended 
engineering  applications  where  experience  can  be  used  to  adjust  design 
parameters. 

A  common  method  used  in  the  factor-of-safety  approach  to  obtain 
the  design  life  of  a  component  subjected  to  alternating  and  steady  stresses 
is  the  implementation  of  the  stress  versus  number  of  cycles  to  failure  curve 
(S-N  curve).  The  S-N  curve  is  obtained  by  placing  test  samples  under 
specific  stress  levels  and  counting  the  number  of  stress  reversals  or  cycles 
up  to  the  sample  failure.  For  materials  whose  life  is  sensitive  to  the  duration 
of  stress,  a  similar  curve  could  be  constructed  for  static  stress  or  for  cases 
where  time  rather  than  cycles  is  the  random  variable.  This  is  done  by  having 
the  abscissa   represent  time  to  failure  rather  than  cycles  to  failure. 

The  S-N  curve  provided  in  Figure  2.1  exhibits  the  features  of 
typical  curves  obtained  for  many  materials.  The  term  endurance  limit  (Se) 
is  used  to  define  the  stress  at  which  the  slope  of  the  S-N  curve  is  flat  or 
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approaching  zero.  This  infers  that  for  a  design  stress  with  a  magnitude  less 
than  the  endurance  limit,  the  material  will  have  infinite  life.  Since  the 
material  will  have  infinite  life,  the  design  life  is  no  longer  required  to  be  a 
design  parameter  and  is  removed  from  the  analysis.  However  implicitly, 
additional  experience-based  modifications  are  needed  to  describe  the  stress 
time  history. 

There  are  fatigue  failure  methods  used  in  design  which  are  based 
on  the  existence  of  an  endurance  limit  partitioned  into  the  constant  stress 
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time  history  (mean  stress)  and  the  alternating  stress  time  history  (cyclic 
stress).  Three  of  the  more  common  theories  are  the  Soderberg,  Modified 
Goodman,  and  Gerber  criteria.  These  failure  theories  can  be  used  to 
compute  a  safety  factor  for  stress  for  particular  values  of  mean  and 
alternating  stresses  in  the  design,  or  to  determine  the  mean  or  alternating 
stress  corresponding  to  a  desired  factor  of  safety  in  life. 

There  are  two  major  shortcomings  in  using  the  S-N  curve  in 
design.  The  first  shortcoming  is  that  in  physical  systems,  there  is  no  such 
thing  as  infinite  life.  The  factually  observed  failures  at  a  stress  lower  than 
the  endurance  limit  cannot  be  quantitatively  defined.  In  fact,  because  the 
slope  of  the  S-N  curve  is  approaching  zero,  life  is  not  single  valued  in  terms 
of  stress;  i.e.  according  to  the  S-N  curve  small  variations  in  applied  stress 
levels  will  give  rise  to  infinite  variability  in  life. 

The  second  shortcoming  is  that  the  incremental  increase  in  safety 
or  functionality  which  results,  for  example,  when  a  safety  factor  of  2  is  used 
instead  of  1.75  cannot  be  quantified.  Therefore,  the  reliability  approach  is 
required  to  rationally  answer  the  question  often  posed  in  the  design  process 
as  to  what  is  the  likelihood  that  a  component  under  design  will  fail  in  the 
service  life. 


2.      Reliability  Approach 

The  reliability  approach  is  based  on  the  formulation  that  the 
applied  stresses,  strength,  and  life  used  in  design  are  not  deterministic.  The 
objective  is  to  select  the  random  variables  salient  to  the  failure  process,  and 
describe  these  variables  by  either  a  nonparametric  or  parametric  distribution 
so  that  a  probable  value  for  the  component  or  system  reliability  can  be 
calculated.  The  random  variables  required  to  determine  the  reliability  of  a 
composite  structure  are  the  applied  stress  and  the  duration  of  time  for  which 
the  stress  is  applied.  The  realizations  of  the  random  variables  are  the  stress 
at  failure  of  the  structure  (strength)  and  the  duration  of  time  up  to  failure 
(life).  The  failure  of  a  structure,  therefore,  is  represented  as  a  joint 
probability  distribution  in  strength  and  life,  in  lieu  of  the  S-N  curve 
described  in  the  previous  section.  Pragmatic  safety  factors  are  no  longer 
required  because  the  reliability  of  the  component  is  quantifiable  with  this 
joint  distribution. 

As  an  illustration  of  why  the  determination  of  the  distributions  is 
important,  consider  schematically  the  joint  probability  of  failure  distribution 
for  a  material  in  Figure  2.2.  In  this  figure,  the  locii  of  the  median  failures 
is  a  solid  line  representing  50%  of  the  samples  that  would  have  failed.  The 
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way  in  which  this  curve  is  used  is  that  the  designer  will  enter  the  graph  with 
the  service  stress  and  determine  the  mean  life  corresponding  to  that  stress. 
This  point  is  denoted  as  point  'A'  in  Figure  2.2.  In  the  factor-of-safety 
approach,  this  mean  life  will  be  reduced  by  some  safety  factor  to  the  design 
point  labelled  'B'  in  the  figure,  which  becomes  the  design  point.  The  dotted 
line  closest  to  the  median  curve  in  Figure  2.2  represents  the  loci  of  points  for 
the  maximum  allowable  number  of  failure,  say  0.01%,  for  the  case  where  the 
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distributions  of  the  strength  and  life  data  have  small  variability.  With  respect 
to  the  design  point,  this  is  a  safe  design,  as  the  probability  of  failure  is  very 
low  for  desired  stress  and  service  life.  The  dotted  line  furthest  from  the 
median  curve  in  Figure  2.2  is  the  loci  of  the  points  of  the  same  maximum 
allowable  number  of  failure  (0.01)  but  for  a  distribution  representing  larger 
variability  in  the  strength  and  life  data.  This  situation  is  higher  risk  with 
respect  to  the  design  point,  because  although  the  design  appears  safe  relative 
to  the  curve  of  median  failures,  the  actual  probability  of  failure  is  larger  than 
the  specified  maximum. 

This  clearly  shows  why  the  reliability  approach  results  in  a  much 
safer  design.  The  S-N  curve  used  in  the  factor-of-safety  approach  does  not 
contain  any  information  as  to  how  the  failures  are  distributed  about  the  mean 
values.  This  approach  is  useful  only  when  a  large  amount  of  data  or 
experience  is  available  for  the  materials  used  in  the  design  where  an 
appropriate  safety  factor  can  be  pragmatically  chosen.  In  the  use  of 
composite  materials  which  utilize  high  technology  fibers  such  as  graphite, 
the  data  or  experience  in  the  material  is  not  sufficient  to  warrant  the  use  of 
safety   factors.      The   relevant   data  must,   therefore,   be   obtained   from 
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experiments  which  are  designed  to  produce  the  greatest  knowledge  of 
parameters  for  describing  the  distribution  of  the  failure  data. 

B.     COMPOSITE  FAILURE  MECHANISMS 

1.      Strength  Model 

In  order  to  model  the  failure  mechanism  of  a  composite  in 
strength,  it  is  necessary  to  understand  how  the  fiber  and  matrix  constituents 
interact  when  the  structure  is  stressed.  A  fiber-reinforced  composite  consists 
of  many  long,  small  diameter  fibers  embedded  in  a  matrix  binder  material. 
The  role  of  the  fibers  is  to  act  as  the  load  carrying  members  in  the  material 
whereas  the  matrix  serves  to  transfer  the  load  from  broken  to  adjacent  non- 
broken  fibers.  If  the  stress  at  a  point  in  the  structure  is  high  enough  to  cause 
a  weak  fiber  to  break,  the  matrix  transfers  the  load  from  the  broken  fiber  to 
its  surrounding  intact  neighbors.  The  majority  of  load  transfer  occurs  in  the 
immediate  vicinity  of  the  break  and  decreases  as  the  distance  from  the  break 
is  increased.    This  is  shown  graphically  in  Figure  2.3. 

Rosen  [2]  developed  a  relationship  quantifying  the  distance 
required  for  the  matrix  to  transfer  the  stress  resulting  from  a  broken  fiber  to 
the  surrounding  fibers.  This  quantity  is  called  the  ineffective  length,  and  is 
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given  in  equation  form  as  follows: 


6  =  \(Vf2-l)  -^-\2  cosh'1  < 

Gm 


2[l-4>]  J 


df  (2-D 


where:       Vf  is  the  volume  fraction  of  the  fiber 
Ef  is  the  modulus  of  the  fiber 
Gm  is  the  shear  modulus  of  the  matrix 
§     is  the  fractional  value,  called  the  fiber  load  sharing 

efficiency,  below  which  the  fiber  is  not  considered  effective. 

The  transfer  of  stress  between  neighboring  fibers  within  an  ineffective  length 

will  occur  until  the  neighboring  fibers  themselves  fail  because  of     the 

increased  stress.   As  the  load  is  increased,  the  number  of  fiber  failures  will 

increase,  which  will  eventually  lead  to  the  clustering  of  fiber  failure  sites. 

The  result  is  that  the  local  stress  concentration  is  so  great  that  the  entire 

structure  catastrophically  fails. 

Knowledge  of  how  the  fiber  fails  in  strength  is  therefore  essential 

for  the  description  of  composite  failure.    A  fiber  failure  will  occur  at  the 

statistically  weakest  segment  of  the  fiber,  which  may  or  may  not  be  located 

at  a  point  of  high  stress  in  the  structure.  Since  the  failure  of  the  fiber  is  one 
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of  extreme  value  governed  by  the  weakest  element,  the  Weibull  distribution 
is  used  to  model  the  failure  process.  The  cumulative  distribution  function 
(CDF)  using  the  Weibull  distribution  is 


F(xt)  =  l-exp{(-(-^)a} 


(2.2) 


The  probability  of  failure  of  a  fiber  corresponding  to  a  particular  load  can 
be  computed  using  this  model  once  the  shape  a  and  location  P  parameters 
are  known. 
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In  order  to  determine  the  strength  of  the  composite,  it  is  necessary 
to  address  the  combinatorial  probability  of  grouping  of  fiber  failure  sites. 
The  model  which  describes  this  chanced  clustering  of  fiber  breaks  was 
developed  by  Harlow  and  Phoenix  [3,4]  and  is  known  as  the  Harlow  and 
Phoenix  Local  Load  Sharing  model.  In  this  model,  a  recursive  relationship 
is  used  to  predict  the  probability  of  failure  based  on  the  fiber  strength 
distribution,  all  of  the  possible  combinations  of  adjacent  fiber  breaks,  and  the 
resulting  stress  concentrations  from  those  breaks  within  an  ineffective  length. 
If  the  number  of  adjacent  breaks  exceeds  a  critical  value  k,  then  the  structure 
will  catastrophically  fail. 

The  Harlow-Phoenix  Local  Load  Sharing  model  was  modified  by 
Wu  and  Harlow  to  incorporate  multi-modality  in  the  fiber  strength 
distributions  which  occur  in  the  regions  of  low  and  high  probabilities  of 
failure.  These  regions  will  be  subsequently  referred  to  as  the  lower  tail  and 
upper  tail,  respectively.  The  resulting  model  is  known  as  the  Tri-modal 
Local  Load  Sharing  model. 

2.      Strength  and  Life  Model 

The  time  dependence  of  mechanical  breakdown  of  fibers  was 
pioneered  by  B.  Coleman  [5]  in  1958.    Coleman  developed  the  theory  of 
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breaking  kinetics,  which  is  concerned  with  the  problem  of  calculating  the 
probability  that  a  fiber  breaks  in  a  given  interval  of  time  when  an  ensemble 
of  fibers  is  subjected  to  a  loading  history.  In  this  theory,  he  develops  a 
breakdown  function  Q.(t)  which  is  a  measure  of  the  breakdown  in  small 
subdivisions  of  the  fiber,  called  slabs,  which  occurred  under  a  loading 
history.  Coleman  concluded  that  the  failure  of  a  fiber  ensemble  is  described 
by  a  death  potential  \|/(£2(t));  a  function  of  the  breakdown  function.  The 
interpretation  is  that  the  damage  \|/  of  the  fiber  is  measured  by  the  damage 
history  Q(t). 

Phoenix  and  Wu  [6]  provided  an  overall  synthesis  of  strength  and 
life.  The  CDF  for  the  time  dependent  failure  of  a  fiber,  in  terms  of 
Coleman's  damage  potential,  was  shown  to  have  the  form 

t 
F(t;l)  =  l-exp{-i|r[|  K(/(s)<fc]}  ,  tzO  (2-3) 

o 

where:         l(t)  is  the  stress  history 

k(-)  is  the  breakdown  rule  or  damage  function 
\|/(-)  is  the  shape  function  (death  function). 

They  were  able  to  cast  the  damage  function  in  terms  of  an  algebraic  form 

which,  when  plotted,  resembles  the  strength-life  curve  shown  in  Figure  2.2. 
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Once  the  time  dependent  failure  properties  of  the  fibers  are  known, 
a  Local  Load  Sharing  model  with  time  dependent  ineffective  length  and 
strength  may  be  developed.  This  is  possible  because  the  failure  mechanism 
of  the  composite  in  life  is  analogous  to  that  in  strength  (i.e.,  local  load 
sharing  among  neighboring  fibers),  with  the  exception  that  both  the  strength 
of  the  fiber  and  the  ineffective  length  5,  which  is  matrix  dominated,  may 
change  over  time.  The  life  of  the  composite  system  is  described  by  a 
function  of  either  a  time  dependent  fiber  strength  element  or  a  time 
dependent  ineffective  length   or  both. 

There  are  currently  life  tests  in  progress  for  AS-4  graphite  fiber 
and  composite  strands  of  the  AS-4  fiber  at  the  Mechanics  of  Materials  for 
Composite  Reliability  Laboratory  at  NPS.  The  overall  objective  of  the  tests 
is  to  produce  strength-life  data  in  order  to  determine  the  parameters  of  the 
failure  model  and  define  the  time  dependency  of  the  ineffective  length  so 
that  a  joint  probability  distribution  of  the  composite  can  be  formed  based  on 
the  properties  of  the  fiber. 
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C.     INFORMATION  THEORY 

Shannon  [7]  developed  a  mathematical  theory  which  dealt  with  the 
statistical  nature  of  the  problem  of  reproducing,  either  exactly  or 
approximately,  a  selected  message  in  communication.  He  introduced  a 
quantity  H  which  is  a  measure  of  information,  choice,  or  uncertainty  in  a 
process.  Shannon  called  this  quantity  H  entropy  as  it  serves  as  a  measure 
of  disorder;  and  it  is  defined  by  the  relationship 


H  =  -K  £  Pi  log  p,  (2-4) 


1  =  1 

where:   pp  p2,  ...  ,  pn   are  the  probabilities  of  occurrence  of  a  set  of 

possible  events 
K   is  a  positive  constant  (choice  of  a  unit  of  measure). 

Shannon  was  primarily  concerned  with  the  amount  of  choice  which  was 

involved  in  the  selection  of  an  event  or  how  uncertain  one  is  of  the  outcome. 

In  this  regard,  he  stated  that  H  would  have  the  property  of  a  monotonically 

increasing  function  of  n  if  all  p,  are  equal.     The  entropy  will  increase 

because,  with  equally  likely  events,  there  is  more  choice  or  uncertainty. 

Lindley  [8]  extended  Shannon's  statistical  concept  of  information  to  the 

notion  of  information  in  an  experiment,  rather  than  in  a  message.     He 
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suggests  the  following  rule  of  experimentation:  "perform  that  experiment  for 
which  the  expected  gain  in  information  is  the  greatest,  and  continue 
experimentation  until  a  preassigned  amount  of  information  has  been 
attained."  Lindley  further  states  that  the  maximum  information  will  result 
when  the  probability  distribution  of  the  desired  parameter  is  concentrated  on 
a  single  value  of  the  parameter,  and  that  the  information  is  reduced  when 
there  is  any  uncertainty  in  the  value  of  the  parameter.  He  defines 
information  /  to  be 

/  =  |p(6)  log/>(6)  dQ  (2-5) 

Note  that  the  form  of  this  equation  for  information  is  similar  in  form  to 
Shannon's  equation  of  entropy  (Eq  2.4)  with  the  exception  of  a  lack  of  the 
minus  sign.  This  minus  sign  was  omitted  because  of  the  major  differences 
in  the  goals  of  a  person  conducting  an  experiment  and  a  person  concerned 
with  the  choices  in  messages.  The  communication  engineer  is  more 
interested  in  maximizing  the  choice  in  messages  vice  having  concentration 
of  the  distribution  on  a  single  value.  Hence,  the  negative  in  Shannon's 
expression  of  entropy.  The  objective  of  performing  the  experiment,  however, 
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is  to  reduce  the  uncertainty  or  increase  the  information  of  the  parameters,  so 
the  negative  sign  is  not  utilized  in  Eq  2.5. 

The  interest  of  this  investigation  is  to  maximize  the  knowledge  of  the 
parameters  of  an  experiment  for  composite  reliability  characterization,  under 
the  constraints  of  equipment  and  time.  The  computation  of  the  information 
/  established  in  Eq  2.5  will  allow  a  measure  to  be  made  of  how  well  the 
parameters  of  the  model  are  known  and  measure  the  amount  of  increase  in 
information  is  achieved  by  testing  more  samples.  In  this  context,  the 
information  can  be  used  as  an  optimality  condition  in  experiment  design. 
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III.     QUANTIFICATION  OF  RELIABILITY 

As  stated  in  the  introduction,  the  quantification  of  reliability  requires 
experimental  data.  The  often  high  cost  of  experiments  mandates  that  the 
experimental  procedure  and  method  of  data  analysis  be  selected  to  glean  the 
most  information.  Once  the  data  is  obtained,  the  resulting  reliability  can  be 
quantified  using  either  a  non-parametric  or  parametric  approach. 
Regardless  of  the  preferred  approach,  the  goal  is  to  determine  a  range  of  the 
random  variable  which  can  be  utilized  within  a  specified  reliability  level. 

This  chapter  is  dedicated  to  evaluating  the  advantages  and  disadvantages 
of  non-parametric  and  parametric  approaches  in  the  characterization  of 
reliability. 

A.     NON-PARAMETRIC  RELIABILITY  CHARACTERIZATION 

Non-parametric  methods  of  characterizing  reliability  are  based  solely  on 
the  data  obtained,  without  the  analytical  modeling  of  underlying  failure 
processes.  As  an  example  of  how  data  is  analyzed  using  a  non-parametric 
method,  consider  the  graphs  in  Figure  3.1.  The  graph  in  Figure  3.1(a)  is  the 
underlying  distribution  of  the  simulated  data  set  which  is  represented  non- 
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parametrically  in  the  form  of  a  histogram.  The  three  sample  sizes  of  N=25, 
100,  and  1000  are  typical  of  the  number  of  data  available  for  materials  which 
have  little  experience  (i.e.,  very  new  material),  moderate  experience,  and 
extensive  experience,  respectively.  In  Figure  3.1(b),  the  histogram  of  the 
small  sample  size  of  N=25  samples  does  not  produce  a  meaningful  shape 
which  could  be  utilized  for  reliability  characterization.  Little  improvement 
results  when  the  number  of  samples  is  increased  to  N=100,  as  shown  in 
Figure  3.1(c).  The  number  of  samples  must  be  increased  to  N=1000  and 
beyond,  as  illustrated  in  Figure  3.1(d),  before  any  meaningful  resemblance 
to  the  underlying  distribution  can  be  made.  The  observation  is  that  non- 
parametric  methods  are  useful  only  when  large  amounts  of  data  are  available; 
and  N>1000  is  very  large  for  engineering  data. 

The  advantage  of  using  a  non-parametric  approach  is  that  only  data  is 
needed.  The  disadvantage  in  the  approach  is  that  it  is  often  not  practical  or 
possible  in  the  case  of  time  dependent  experiments  to  obtain  the  large 
amount  of  data  needed  to  make  this  approach  produce  meaningful  results. 
As  a  result,  reliability  predictions  for  new  engineering  applications  must 
almost  always  be  based  on  a  proper  model  with  adequately  estimated 
parameters  from  limited  data. 
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Non-Parametric  description  of  Probability  Density  Function 
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FIGURE  3.1   NON-PARAMETRIC  DESCRIPTION  OF  F(X) 
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B.     PARAMETRIC  RELIABILITY  CHARACTERIZATION 

The  prediction  of  reliability  in  a  parametric  sense  is  based  on  the 
selection  of  an  appropriate  model  and  estimation  of  the  associated  parameters 
from  data.  The  model  must  adequately  represent  the  physical  process  of  the 
phenomenon  being  characterized.  In  the  case  considered  herein,  the  failure 
of  a  graphite  fiber  in  tension  is  known  to  be  governed  by  the  strength  of  the 
weakest  segment  or  link  in  the  fiber.  Since  it  is  the  presence  of  a  extreme 
value,  such  as  the  weakest  link,  which  causes  the  failure,  the  process  is 
modeled  using  a  Weibull  distribution.  The  CDF  of  the  Weibull  distribution 
is  given  as 

F(x)  =  1  -  exp(-(|)a)  (3.1) 

where:       a  is  the  shape  parameter 

p  is  the  location  parameter. 

The  shape  parameter  and  location  parameter  oc  and  p  are  analogous  to  the 

reciprocal  standard  deviation  (l/o)  and  mean  (u)  used  in  the  Gaussian 

distribution,  respectively.    The  importance  of  selecting  the  correct  model 

cannot  be  overstated.  No  matter  how  diligent  the  effort  is  in  determining  the 
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parameters  of  the  model,  if  the  model  is  not  correct,  the  resulting  prediction 
will  not  be  meaningful. 

As  stated  above,  the  Weibull  distribution  is  known  to  be  the  appropriate 
model  for  predicting  fiber  reliability.  The  problem  now  becomes  one  of 
determining  the  values  of  the  unknown  parameters  a  and  p.  Since  an 
infinite  number  of  samples  cannot  be  tested,  the  true  parameters  will  never 
be  known;  they  can  only  be  estimated.  An  estimated  parameter,  or 
estimator,  will  always  possess  some  uncertainty.  This  introduces  the  concept 
that  the  estimators  themselves  may  be  described  by  a  distribution. 

The  objective  in  the  design  of  an  experiment  is  to  develop  a  method  of 
conducting  the  experiment  which  would  result  in  the  estimation  of  the 
estimators  a  and  p  within  a  desired  range.  The  circular  difficulty  in  this  is 
that  the  experimental  procedure  cannot  be  planned  in  advance  to  produce  the 
desired  range  of  the  estimators  because  little  is  known  about  the  parameters 
prior  to  running  the  experiment.  For  example,  if  the  application  driving  the 
experiment  is  one  which  requires  a  very  high  level  of  reliability,  such  as  a 
structure  in  a  nuclear  safe  system,  then  the  estimated  parameters  will  need 
to  be  determined  with  a  very  small  variance  from  the  true  values.  One  may 
jump  to  the  conclusion  that  this  would  require  extensive  testing  and  very 
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large  data  sets.  However,  it  will  be  shown  in  a  later  section  that  certain 
processes  have  the  intrinsic  property  of  small  variability  in  the  estimators,  so 
that  only  a  small  amount  of  data  would  suffice  in  achieving  the  certain 
bounds  on  the  estimators,  and  the  experiment  could  be  planned  accordingly. 
The  dilemma  is  that  the  existence  of  this  property  would  not  be  known 
without  having  first  conducted  the  experiment  to  produce  the  data.  One 
solution  of  this  dilemma  is  to  use  simulation  in  the  design  of  the  experiment. 
A  logical  representation  of  the  rationale  of  incorporating  Monte  Carlo 
simulation  in  the  experiment  design  is  provided  in  Figure  3.2. 

The  first  step  of  the  simulation  is  to  assume  the  values  of  underlying 
parameters  for  the  simulated  data  set.  If  possible,  the  selection  of  these 
parameters  should  be  within  the  anticipated  range  of  values  for  the 
underlying  parameters  of  the  actual  data  set  so  that  the  simulated  process 
will  closely  approximate  the  actual  process.  Although  it  is  true  that  little 
information  may  be  known  about  the  parameters  prior  to  commencement  of 
the  experiment,  the  expected  range  of  parameters  can  normally  be  bracketed 
about  a  limited  range  of  values.  For  example,  considering  the  life  of  a 
graphite  fiber,  it  is  anticipated  that  the  value  of  the  shape  parameter  of  the 
distribution  will  be  between  0.1  and  1  because  of  the  experience  in  glass 
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Validate  optimal  experimental  design  by  Monte  Carlo  Simulation 
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FIGURE  3.2   OPTIMUM  EXPERIMENT  DESIGN  BY  MONTE 
CARLO  SIMULATION 

fibers  and  Kevlar  fibers.  To  explore  the  effects  of  the  experiments,  two 
separate  simulations  should  be  run  with  the  underlying  values  of  a  assigned 
in  the  upper  and  lower  range  (i.e.,  0.1  and  1)  and  the  proposed  procedures 
can  be  evaluated  accordingly. 

The  random  data  set  is  simulated  using  the  known  values  of  the  above 
range  of  underlying  parameters.  Once  the  simulated  data  set  is  obtained,  the 
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experiment  design  methodology  is  applied  to  that  simulated  set  of  data.  In 
the  case  of  this  study,  an  information  based  experimental  design 
methodology  is  used  to  obtain  optimum  estimation  of  the  parameters.  The 
estimated  parameters  based  on  the  simulated  data,  denoted  a  and  p,  are 
computed  using  the  prescribed  procedure  and  are  compared  with  the  known 
parameters.  If  the  estimated  parameters  adequately  recover  the  known  values 
of  the  underlying  parameters,  the  method  is  verified  and  can  be  implemented 
on  an  actual  set  of  data.  The  analysis  of  the  actual  data  using  the  verified 
information  experimental  design  procedure  will  assure  the  optimal  parameters 
estimation. 

This  section  provided  an  overview  of  how  a  random  set  of  data  is 
simulated  and  how  it  may  be  implemented  in  an  experiment  design.  The 
remaining  portion  of  this  chapter  will  be  dedicated  to  the  presentation  of  the 
results  of  many  different  simulations  conducted  and  what  inferences  can  be 
made  regarding  the  role  of  the  parameters  in  reliability. 

C.     THE  ROLE  OF  PARAMETERS  IN  RELIABILITY 

In  order  to  determine  the  role  of  the  parameters  in  reliability,  it  is  first 
necessary  to  distinguish  the  best  graphical  domain  with  sufficient  sensitivity 
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FIGURE  33  GRAPH  OF  THE  PDF  F(X)  FOR  TWO  DATA  SETS 
in  reliability  to  identify  the  uncertainty  in  one  or  both  of  the  parameters  a 
and  p.  The  parametric  distributions  may  be  presented  either  as  a  probability 
density  function  (pdf),  cumulative  distribution  function  (CDF),  or 
transformed  CDF.  The  transformed  CDF,  denoted  F\  is  a  linearized  plot  of 
the  CDF  accomplished  by  the  weakest  link  transformation  of  the  probability 
F(Xj)  using  the  relationship 


F*  =ln(-ln(l-F(xI))) 


(3.2) 


and  transforming  the  realized  random  variable  by  ln(Xj).   These  three  plots 
are  shown  in  Figures  3.3  through  3.5,  each  containing  a  plot  of  distributions 
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resulting  from  two  simulated  data  sets  of  ten  each  with  different  values  for 
the  parameters  as  indicated. 

Two  data  sets  often  points  each  are  simulated  to  illustrate  the  graphical 
appearance  of  data  in  the  three  respective  probability  spaces.  One  large 
scatter  data  set  (open  circles)  and  one  small  scatter  data  set  (solid  circles)  are 
simulated  from  parameters  as  indicated;  the  corresponding  underlying 
distributions  are  respectively  shown  by  solid  and  dashed  lines.  In  the  density 
space  shown  in  Figure  3.3,  the  sample  number  is  too  small  to  form  a 
nonparametric  empirical  pdf,  as  previously  noted  in  section  A  of  this  chapter. 
Additionally,  even  if  enough  data  was  available  for  a  description  of  the  pdf, 
subtle  changes  of  the  curve  in  the  region  of  very  high  levels  of  reliability, 
which  correspond  to  low  levels  of  probability  of  failure  (e.g.  10"4  and  below), 
are  not  distinguishable  in  the  pdf  domain.  This  portion  of  the  distribution 
is  called  the  lower  tail,  and  its  shape  is  the  focus  of  reliability  applications. 
The  lower  tail  is  sensitive  to  variations  in  the  parameters,  making  it 
important  that  this  portion  of  the  curve  be  graphically  representable. 

While  the  same  small  data  sets  form  a  more  meaningful  trend  in  the 
CDF  space,  the  lower  tail  is  also  obscured.  The  data  points  of  the  simulated 
data  sets  are  plotted  in  this  domain  based  on  the  rank  of  each  sample.   The 
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notion  of  rank  refers  to  the  value  of  the  realized  random  variable  of  the 
sample  relative  to  the  sample  tested,  whereas  the  true  rank  is  the  value 
relative  to  all  possible  samples  that  exist  in  the  population,  tested  or 
otherwise.    A  further  discussion  of  rank  is  provided  in  Appendix  A. 

The  model  which  best  fits  this  data  is  provided  by  the  parameters  of  the 
model.  The  estimation  of  the  best  fit  parameters  is  performed  in  the  density 
domain,  which  results  in  the  likelihood  estimators.  The  likelihood  estimators 
are  the  most  likely  parameters  based  on  the  observed  data.  For  the  Weibull 
model,  the  likelihood  estimators  are  denoted  &  and  $.  Likelihood  estimators 
and  maximum  likelihood  estimators  (MLE)  are  explained  in  more  detail  in 
Appendix  B. 

Once  the  estimated  parameters  are  determined  using  likelihood 
estimation,  the  curve  representing  the  CDF  may  be  plotted.  The  two 
resulting  curves  for  the  two  simulated  data  sets  are  shown  in  Figure  3.4. 
Because  the  slope  of  the  curves  in  the  CDF  domain  is  approaching  zero  in 
the  low  range  of  probability  of  failure  they  do  not  provide  the  resolution  to 
distinguish  the  lower  tail  of  the  curve. 

The  graph  of  F*  versus  In  (Xj)  for  the  two  data  sets  is  provided  in 
Figure  3.5.    This  graph  is  useful  in  reliability  because  the  subtleties  in  the 
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FIGURE  3.4  GRAPH  OF  THE  CDF  F(X)  FOR  TWO  SETS  OF  DATA 
lower  tail  will  clearly  be  evident.  There  is  an  additional  advantage  of  using 
this  domain:  the  shape  parameter  a  is  the  slope  of  the  line,  and  the  location 
parameter  P  is  the  x  value  which  corresponds  to  the  value  of  zero  on  the  F* 
axis.  Wide  scattering  of  data  is  characterized  by  a  small  value  of  a,  or  a 
nearly  flat  line  in  the  F*  plot;  a  large  value  of  a  corresponds  to  very  little 
scatter,  which  is  represented  by  a  steep  line  in  F*. 

As  previously  mentioned,  the  area  of  interest  in  the  graph  with  respect 
to  reliability  is  the  lower  left  hand  portion  of  the  graph  which  corresponds 
to  the  region  of  low  probabilities  of  failure.  This  region  is  referred  to  as  the 
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FIGURE  3.5   TRANSFORMED  CDF  F*(X)  FOR  TWO  SETS  OF  DATA 
reliability  region  and  is  shown  in  Figure  3.6  as  the  shaded  region  in  the  F* 

graph.    The  top  of  this  region  is  bounded  on  the  F*  axis  by  the  maximum 

probability  of  failure  permitted  by  the  design  specification,  and  bounded  on 

the  Xj  axis  by  the  maximum  permitted  value  of  the  design  variable,  also 

defined  in  the  design  specification.  For  example,  the  design  specification  of 

a  composite  pressure  vessel  may  state  that  the  vessel  should  be  designed  to 

accommodate  a  maximum  stress  of  30  ksi  with  a  reliability  of  0.9999.  This 

specification  would  define  a  reliability  region  of  0.0001  on  the  F*  axis  or 
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FIGURE  3.6   RELIABILITY  REGION  IN  THE  F*  DOMAIN 

F<(1  -0.9999)  and  x  <  30  ksi  on  the  x-axis.  A  sample  of  how  the  reliability 

region  would  appear  as  the  shaded  region  in  Figure  3.6. 

The  reliability  region  is  used  to  determine  whether  the  object  under  test 

will  be  capable  of  meeting  the  design  specifications.    As  an  illustration, 

consider  the  two  linearized  curves  of  the  simulated  data  sets  in  Figure  3.6. 

The  curve  which  characterizes  the  data  set  with  a  small   amount  of  scatter 

(a=10)  passes  below  the  reliability  region.   This  is  representative  of  a  very 

safe  design,  or  possibly  an  overdesign,    because  the  maximum  allowed 

probability  of  failure  is  not  reached  until  relatively  high  values  of  the  random 
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variable  are  reached.  Or  stated  differently,  for  the  maximum  permitted  value 
of  the  random  variable  for  the  design,  such  as  stress  for  a  structure,  the 
probability  of  failure  is  much  lower  than  required.  The  designer  may  then 
evaluate  whether  the  cost  of  this  extra  safety  is  justifiable. 

The  curve  representing  the  simulated  data  set  with  a  large  amount  of 
scatter  (a=2)  passes  above  a  portion  of  the  reliability  region.  In  this  case 
the  design  does  not  meet  the  required  reliability  and  safety  specification  for 
certain  values  of  the  random  variable.  This  result  would  indicate  that  either 
the  design  be  reworked  to  meet  the  specifications  or  the  specifications  be 
relaxed  to  accommodate  less  reliability  and  safety. 

The  parameters  of  the  model  selected  to  describe  the  physical  process 
have  been  shown  to  be  essential  in  the  quantification  of  reliability.  The 
reliability  region  was  identified  in  evaluating  the  impact  of  the  values  for  the 
parameters  on  the  design.  The  next  section  will  address  how  the  estimated 
parameters  of  the  model  vary  with  the  number  of  data  tested  and  the  type  of 
data  obtained  from  an  experiment. 
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D.     STATISTICS  OF  THE  ESTIMATED  PARAMETERS 

In  order  to  show  how  the  statistics  of  the  estimated  parameters  vary, 
multiple  simulations  were  conducted  with  different  numbers  of  samples  in 
a  data  set  and  different  values  of  the  underlying  parameters.  For  each 
simulation  run,  the  MLE  parameters  were  computed  for  the  generated  data 
and  tabulated.  Since  there  are  two  parameters  estimated,  the  relative 
frequency  of  occurrence  of  an  d,  $  pair  is  represented  as  a  three-dimensional 
histogram.  Each  column  in  the  histogram  would  represent  the  number  of 
occurrences  that  the  MLE  parameter  fall  within  a  specified  band,  called  the 
class  interval. 

The  underlying  values  of  the  shape  parameter  a  selected  for  the  three 
simulations  are  0.8,  5,  and  20.  The  rationale  for  selecting  these  values  is 
that  they  are  numbers  which  are  typical  in  describing  life  data  for  a  Kevlar 
fiber,  strength  data  for  a  graphite  fiber,  and  strength  data  for  a  composite 
strand,  respectively.  These  numbers  not  only  illustrate  the  statistics  of  the 
parameters  in  the  area  of  composite  reliability,  the  interpretations  can  also 
be  extended  to  data  sets  in  general  which  are  characterized  by  a  large 
amount  of  data  scatter  (oc=0.8),  intermediate  amount  of  scatter  (a=5),  and 
small  scatter  (a=20).   The  two  sample  sizes  selected  for  the  simulation  are 
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N=10  and  N=100  samples.  The  rationale  for  the  selection  of  these  two 
numbers  of  samples  is  that  they  are  typical  bounds  used  in  the  quanification 
of  reliability  in  engineering  design. 

The  MLE  parameters  were  calculated  for  each  simulated  data  set  of 
N=10  and  N=100  samples  and  stored.  The  process  was  then  repeated  10,000 
times;  resulting  in  10,000  &  and  $.  The  objective  of  the  simulation  was  to 
observe  any  differences  which  might  exist  in  the  distributions  of  the 
estimators  as  a  function  of  the  values  of  the  shape  parameters  and  sample 
sizes. 

The  location  parameter  P  was  not  varied  in  the  simulation  because  the 
variability  can  be  observed  in  the  normalized  form. 

1.      Inferences   on   the   Statistics   of  the   Parameters   based   on 
Simulation 

The  results  of  the  simulations  are  provided  in  Figures  3.6,  3.7,  and 
3.8.  Each  figure  is  a  histogram  representing  the  relative  frequencies  of  the 
estimators  and  the  corresponding  marginal  distributions  which  are  obtained 
by  projecting  the  histogram  on  the  d,  and  $  axes.  Stated  mathematically  for 
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(a)  Underlying  shape  parameter  a  =  5,  data  sets  of  N=10  samples 
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discrete  random  variables,  the  marginal  distribution  is  defined  as 

•  (3.7) 

fy(y)  =  }2  /(w) 

n 

where:     fx(x)  are  the  marginal  distributions  of  the  x  random  variable. 
fy(x)  are  the  marginal  distributions  of  the  y  random  variable. 
f(x„,y)  is  the  joint  probability  distribution  of  x^  with  y  constant. 
f(x,yn)  is  the  joint  probability  distribution  of  yn  with  x  constant. 

It  is  important  to  note  that  in  each  of  the  histogram  or  marginal  distribution 

plots,  the  a  and  $  axes  have  been  normalized  by  the  known  underlying 

values  so  that  the  true  values  will  be  represented  as  the  number  1.0  on  each 

of  the  axes. 

The  histograms  and  marginal  distributions  for  the  case  where 

oc=0.8  and  N=10  and  N=100  are  provided  in  Figure  3.6.  The  joint  histogram 

in  Figure  3.6(a)  is  characterized  by  a  large  scatter  of  the  computed  MLE 

parameter  for  the  case  of  N=10.     This  is  also  evident  in  the  marginal 

distributions  of  &  and  $.  The  scatter  was  reduced  when  more  samples  were 

tested  in  each  data  set,  as  shown  in  the  case  when  N=100.   In  Figure  3.6(b) 

the  marginal  distributions  of  &  and  $  indicate  that  a  tenfold  increase  in  the 

amount  of  data  in  each  sample  set  significantly  reduces  the  uncertainty  of 
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recovering  the  correct  parameters.  It  is  important  to  note  that  for  both  cases 
where  N=10  and  N=100,  the  distribution  of  the  estimated  location  parameter 
$  is  wider  than  the  shape  parameter  6c.  The  overall  interpretation  is  that 
when  a  is  small  (i.e.,  the  data  is  largely  scattered)  there  exists  a  large  range 
of  parameters  which  would  be  likely  to  describe  the  data. 

The  underlying  value  of  the  shape  parameter  a  in  Figure  3.7  was 
increased  to  a=5  in  order  to  represent  sets  of  data  that  possess  an 
intermediate  amount  of  scatter.  For  the  case  of  very  small  sample  size 
N=10,  as  shown  in  Figure  3.7(a),  the  uncertainty  in  the  MLE  a  is 
approximately  the  same  as  that  shown  for  oc=0.8.  The  uncertainty  in  the 
location  MLE  $,  however,  is  dramatically  reduced  when  compared  to  the  $ 
curve  in  Figure  3.6(a).  The  uncertainty  in  the  both  estimators  is  reduced  by 
increasing  the  sample  size  to  N=100.  But  as  illustrated  in  Figure  3.7(b),  the 
amount  of  the  reduction  in  the  uncertainty  of  p  obtained  by  increasing  the 
number  of  samples  simulated  in  each  data  set  is  not  as  great  compared  to  the 
reductions  obtained  for  a=0.8.  The  interpretations  of  these  results  are  that 
since  the  scatter  in  the  data  of  the  simulated  data  sets  was  reduced,  the 
location  parameter  can  be  estimated  fairly  accurately,  even  with  a  small 
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amount  of  data.  The  increase  in  the  samples  comprising  each  data  set 
significantly  reduced  the  uncertainty  of  the  estimator  d. 

A  data  set  which  has  little  scatter  in  the  data  is  representable  by 
a  large  value  of  the  shape  parameter.  The  results  of  the  simulation  for  this 
type  of  data  set  are  provided  in  Figure  3.8,  where  oc=20.  Again,  as  shown 
in  Figure  3.8(a),  the  uncertainty  in  d  is  approximately  the  same  as  that 
obtained  for  oc=0.8  and  a=5.  Note  that  the  $  is  known  very  well,  and  the 
improvement  in  the  uncertainty  of  this  estimator  is  not  detectable  when  the 
sample  data  set  is  increased  to  N=100  samples,  as  illustrated  in  Figure  3.8(b). 

If  the  goal  of  the  experiment  was  to  estimate  $  (the  estimator  of 
the  location  parameter  p),  then  for  data  which  is  described  by  very  little 
scatter  (large  shape  parameter),  increasing  the  number  of  samples  provides 
little  improvement  on  the  level  of  uncertainty  in  $.  A  similar  trend  is 
expected  for  the  estimated  shape  parameter  d.  Although  this  trend  was  not 
demonstrated  for  the  series  of  simulations  performed  here,  it  is  anticipated 
that  for  very  high  values  of  a,  say  a>50,  only  a  small  amount  of  data 
would  be  required  to  estimate  the  parameter  with  a  high  degree  of  certainty. 
In  the  limiting  case  where  a=<*>,  the  curve  of  the  distribution  would  be  a 
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vertical  line  in  the  F*  domain,  and  only  one  realization  of  the  random 
variable  would  be  required  to  define  the  parameters. 

The  influence  of  the  number  of  samples  in  the  data  set  and  the 
associated  scatter  possessed  by  that  data  set  significantly  affects  the 
uncertainty  of  the  estimator.  In  some  experiments,  only  limited  amount  of 
data  would  be  required  to  determine  the  parameters;  in  others  generous 
amounts  of  data  would  be  needed.  It  is  clear  that  some  form  of  optimality 
condition  should  be  devised  which  would  assign  a  value  to  the  amount  of 
uncertainty  in  the  estimated  parameters.  The  experiment  could  therefore  be 
designed  so  that  the  uncertainty  is  minimized  under  the  constraints  of  limited 
equipment  and  time.  The  formulation  of  this  optimality  condition  is  the 
subject  of  Chapter  IV. 
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IV.     OPTIMIZATION  OF  INFORMATION  IN  AN  EXPERIMENT 

The  statistics  of  the  estimated  parameters  are  influenced  by  both  the 
amount  of  data  obtained  from  an  experiment  and  the  intrinsic  scatter  in  the 
data.  In  many  cases,  large  amounts  of  data  are  required  to  estimate  the 
parameters  within  the  desired  degree  of  certainty.  However,  an  experiment 
will  only  produce  a  limited  amount  of  data.  Experiments  are  commonly 
limited  in  the  time  available  to  conduct  them,  or  by  the  capacity  of  the  test 
equipment.  In  time-limited  experiments,  such  as  in  life  and  fatigue  tests, 
production  deadlines  often  limit  the  amount  of  time  allowed  to  perform  the 
required  testing  and  resources  limit  the  number  of  experiments  which  can  be 
run  concurrently.  Capacity  limited  data  refers  to  the  situation  where  the 
capacity  of  the  test  equipment  is  not  sufficient  to  cause  a  realization  of  the 
random  variable.  For  example,  consider  the  problem  of  determining  the 
ultimate  torsional  strength  of  a  propulsion  shaft  for  an  aircraft  carrier.  If  the 
test  equipment  available  does  not  have  the  capacity  to  achieve  the  predicted 
failure  torque,  then  the  experiment  is  said  to  be  limited  in  capacity. 
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A.  CENSORING  OF  AN  EXPERIMENT 

Censoring  the  experiment  means  the  experiment  is  terminated  at  either 
a  scheduled  value  of  the  random  variable  or  by  the  number  of  realizations 
which  have  occurred.  In  a  life  experiment,  scheduled  censoring  means  that 
the  experiment  is  terminated  after  a  preassigned  period  of  time  has  elapsed, 
regardless  of  the  number  of  samples  that  have  failed.  If  the  life  experiment 
was  censored  by  the  number  of  realizations,  then  the  experiment  would  be 
terminated  after  the  predetermined  number  of  samples  have  failed,  regardless 
of  the  time  it  takes  for  those  samples  to  fail.  A  capacity  limited  experiment 
is  essentially  a  schedule  censored  experiment  because  the  it  will  be 
terminated  when  a  specific  value  of  the  random  variable  is  reached,  no 
matter  how  many  samples  have  failed.  Consider  the  previous  example  of  the 
aircraft  carrier  propulsion  shaft.  If  the  test  equipment  in  use  does  not  have 
the  capacity  to  achieve  the  desired  value  of  torque,  then  the  experiment  is 
effectively  censored  (scheduled)  at  the  maximum  torque  that  may  be 
achieved  by  the  test  equipment. 

The  question  which  arises  in  censoring  an  experiment  is  when  should 
the  censor  be  implemented.  Is  it  better,  for  example,  to  allow  10  objects  on 
test  to  proceed  for  the  duration  of  a  three  year  life  experiment,  or  censor  the 
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experiment  at  the  end  of  each  year  and  immediately  put  10  new  samples  on 
test,  thereby  testing  a  total  of  30  samples  for  one  year  each?  Further 
complicating  the  issue,  one  may  ask  whether  it  might  even  be  more  useful 
to  censor  every  six  months,  so  that  the  number  of  repeated  tests  is  six  vice 
three  or  one.  The  advantage  of  censoring  every  six  months  is  that  a  total  of 
60  samples  would  be  put  on  test  as  compared  30  samples  for  the  yearly 
censor,  or  only  10  samples  for  the  case  if  no  censoring  was  utilized.  The 
disadvantage  of  the  six  month  censor  plan  is  that  the  six  months  might  not 
be  enough  time  to  produce  any  failures. 

A  similar  decision  is  required  for  capacity  limited  experiments.  The 
censor  torque  for  the  aircraft  carrier  shaft  example  might  be  the  maximum 
torque  of  the  test  equipment.  However,  if  no  shafts  fail  at  that  torque 
because  it  is  low  relative  to  the  mean  failure  torque,  then  no  data  is  obtained 
for  estimating  the  parameters  of  the  failure  model.  What  is  desired  is  to 
know  the  minimum  down-sizing  of  the  shaft  for  testing  so  that  some  failure 
will  be  observed,  producing  useful  data  for  analysis.  The  reason  the  largest 
shaft  is  desired  is  to  minimize  size  effect  extrapolation  from  the  reduced  size 
test  sample  size  back  to  actual  prototype  dimensions. 
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1.      Effect  of  Censoring  on  the  Estimated  Parameters 

The  solution  of  when  to  censor  either  a  time  or  capacity  limited 
experiment  is  secured  through  information  of  the  likelihood  function.  The 
reason  the  likelihood  function  is  chosen  for  this  purpose  is  that  it  captures 
the  probability  that  a  specific  pair  of  parameters  are  the  ones  which  provide 
the  best  fit  of  the  data.  Since  the  goal  of  performing  the  experiments  is  to 
determine  the  parameters  of  the  failure  model,  the  likelihood  function  will 
serve  to  gauge  the  effect  of  early  censoring  on  the  estimated  parameters. 

The  likelihood  function  of  Eq  B.l  must  be  revised  to  account  for 
the  samples  which  did  not  fail  at  the  point  when  the  experiment  was 
censored.  All  of  the  samples  which  would  have  remained  on  test  are 
therefore  reliable  up  to  the  point  of  censor.  The  likelihood  function  becomes 
the  product  of  the  probabilities  of  the  realized  data  times  the  reliabilities  of 
the  samples  which  survived.  In  equation  form,  the  likelihood  function  for 
n  exact  and  c  censored  samples  of  N  total  samples  becomes 


L  = 


n 


n  m> 


i=i 


[1-F(x.)f~c  (4.D 


Figures  4.1  through  4.3  demonstrate  the  impact  of  censoring  on  the 
likelihood  surface  and  the  corresponding  marginal  distributions  of  the  shape 
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parameter  a  based  on  the  point  a  simulated  experiment  is  censored.  In  the 
simulation,  three  different  numbers  of  realized  data  were  used  as  the  censor 
points.  Additionally,  the  simulation  was  executed  for  three  different  values 
of  the  underlying  values  of  a.  The  motivation  behind  selecting  three 
different  censor  locations  and  underlying  shape  parameters  was  to  evaluate 
the  effect  of  the  number  of  realized  data,  denoted  n,  and  the  amount  of 
scatter  possessed  by  the  experimental  data  have  on  the  estimated  parameters. 
Two  of  the  values  selected  for  ot  in  the  simulations  were  0.2  and  1,  as  they 
represent  typical  high  and  low  values  for  the  shape  parameter  in  fiber  life 
experiments.  The  other  value  was  chosen  to  be  a=5,  a  commonly  observed 
value  for  describing  fiber  strength  data.  Choosing  practical  numbers  such  as 
these  will  provide  some  insight  as  to  how  the  actual  data  obtained  from  the 
fiber  experiments  will  affect  the  estimated  parameters. 

The  likelihood  surfaces  in  Figures  4.1(a)-(c)  are  the  result  of 
censoring  a  simulated  experiment  of  100  samples  which  have  the  intrinsic 
property  of  producing  a  large  amount  of  scatter  in  the  realized  data  (oc=0.2). 
Each  surface  has  the  appearance  of  a  long  runnel  parallel  to  the  p  axis  (the 
location  parameter).  The  interpretation  is  that  when  a  is  small  (large 
scatter),  a  large  range  of  (3's  could  equally  likely  be  the  underlying  location 
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parameter.  The  marginal  distributions  of  ot,  provided  in  Figure  4.1(d),  are 
obtained  by  numerically  integrating  out  the  p  from  the  likelihood  surface. 
There  is  minor  difference  in  the  shapes  of  the  curves  represented  by 
censoring  after  5,  10,  and  25  realizations.  This  is  an  important  result 
because  it  shows  that  little  improvement  in  the  uncertainty  of  the  estimated 
parameter  is  obtained  by  prolonging  the  experiment  so  that  25  failures  would 
occur  instead  of  censoring  it  after  only  five  failures.  This  suggests  that  early 
censoring  is  effective  for  processes  described  by  small  values  of  a.  The 
early  censor  would  therefore  effectively  allow  a  larger  number  of  samples  N 
to  be  tested,  resulting  in  more  early  failures  n  to  be  used  in  the  analysis. 
The  disadvantage  of  implementing  an  early  censor  strategy  is  that  knowledge 
of  the  location  parameter  is  sacrificed,  because  data  near  the  mean  value  is 
never  realized  because  the  experiment  was  terminated  early.  In  life 
experiments,  however,  for  low  values  of  a  the  mean  life  is  of  secondary 
interest.  Recall,  the  most  useful  quantity  is  the  shape  parameter  of  the  lower 
tail  in  the  reliability  region.  The  more  data  which  is  realized  in  the  area  of 
interest,  the  better  the  estimate  of  the  shape  parameter. 

The  effect  of  censoring  a  simulated  experiment  producing  a  data 
set  described  by  an  underlying  a=l  is  given  in  Figure  4.2.    The  resulting 
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(a)  Likelihood  (data  censored  at  5  points) 
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(b)   Likelihood  (data  censored  at  10  points) 
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FIGURE  4.1    EFFECT  OF  CENSORING  ON  LIKELIHOOD  FOR  a=0.2 
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(a)  Likelihood  (data  censored  at  5  points)  (5)  Likelihood  (data  censored  at  10  points) 
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(c)  Likelihood  (data  censored  at  25  points) 
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FIGURE  4.2   EFFECT  OF  CENSORING  ON  LIKELIHOOD  FOR  a=l 
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(a)  Likelihood  (data  censored  at  5  points)  (b)  Likelihood  (data  censored  at  10  points) 
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FIGURE  4.3   EFFECT  OF  CENSORING  ON  LIKELIHOOD  FOR  oc=5 
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data  possesses  much  less  scatter  than  the  previous  data  set  described  by 
oc=0.2.  The  realizations  of  the  data  have  less  variability,  resulting  in 
narrowing  the  range  of  parameters  which  would  be  likely  to  fit  the  data. 
This  is  evident  in  the  likelihood  surfaces  shown  in  Figures  4.2(a)-(c),  and  in 
the  marginal  distributions  of  a  in  Figure  4.2(d).  The  trend  of  reducing  the 
range  of  the  probable  number  of  parameters  as  more  data  is  realized  is 
visible  in  the  shapes  of  the  likelihood  surfaces.  The  likelihood  surface  of 
Figure  4.2(a)  is  that  of  only  five  data  points.  As  illustrated  by  the  'L' 
shaped  surface,  the  are  many  a  and  p  combinations  which  would  be  equally 
likely.  When  more  data  points  are  obtained  through  delaying  the  censor 
(Figure  4.2(c)),  the  shape  of  the  likelihood  surface  shrinks  to  a  smaller 
number  of  probable  parameters.  The  corresponding  marginal  distributions 
of  each  of  these  surfaces  is  provided  in  Figure  4.2(d).  The  significant 
reduction  in  the  uncertainty  in  a  is  obtained  by  extending  the  experiment  to 
achieve  more  realizations  at  higher  values  of  the  random  variable.  The 
observation  in  this  case  is  that,  for  intermediate  variability  (oc=l),  censoring 
such  a  life  experiment  should  be  delayed. 

The  likelihood  surfaces  and  marginal  distributions  of    a  for  the 
data  simulated  for  an  underlying  a=5  are  shown  in  Figure  4.3(a)-(d).    The 
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reduction  of  the  probable  pairs  of  parameters  resulting  from  increasing  the 
number  of  realized  data  prior  to  censoring  is  illustrated  by  the  sharpening  of 
the  likelihood  surfaces  in  Figures  4.3(a)-(c).  The  resulting  marginal 
distributions  of  a  show  that  a  dramatic  reduction  of  uncertainty  in  the 
parameters  is  obtained  by  letting  the  experiment  continue  in  time  vice  censor 
early.  The  reason  for  this  is  that  for  a=5,  the  failures  will  occur  in  a  tighter 
range  about  the  mean  value,  so  that  early  failures  really  do  not  provide  much 
detail  about  the  characteristics  of  the  data. 

Figures  4.1  through  4.3  have  shown  that  the  number  of  realized 
data  and  the  underlying  shape  parameter  of  the  data  have  a  significant  impact 
on  the  nature  of  the  uncertainty  of  the  parameters,  and  therefore  play  a 
crucial  role  in  determining  a  censor  strategy.  For  large  numbers  of  realized 
data,  the  likelihood  surfaces  will  shrink  to  a  spike  about  the  underlying 
parameters,  regardless  of  the  scatter  in  the  data,  making  the  problem  trivial. 
For  the  more  realistic  cases  where  there  is  only  a  small  number  of  realized 
data,  the  knowledge  gain  for  the  estimated  oc  is  larger  than  for  the  estimated 
p  when  oc<l.  If  ool,  however,  the  reverse  is  true.  An  explanation  of  this 
trend  is  achieved  by  referring  back  to  the  appearance  of  the  distributions  in 
the  F*  domain.    In  the  F*  graph,  the  parameter  a  is  the  slope  of  the  line 
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representing  the  distribution,  and  the  value  of  p  is  the  location  of  intersection 
between  the  distribution  and  the  horizontal  line  of  F*=0  (or  F=.632).  If  a<l , 
then  the  slope  of  the  distribution  in  the  F*  domain  is  less  than  45°.  For 
values  of  a«l,  the  distribution  is  nearly  flat,  and  any  variation  of  values 
along  the  F*  axis,  caused  by  errors  in  the  empirical  rank  of  the  data,  would 
affect  the  value  of  p  the  most  because  it  is  the  point  of  intersection  of  two 
nearly  horizontal  lines.  For  <X>1,  the  slope  of  the  distribution  in  the  F* 
domain  is  greater  than  45°.  If  oc»l,  then  the  distribution  is  a  steep  line, 
causing  the  rank  error  to  effect  the  slope  a  more  than  p. 

The  design  variable  in  planning  an  optimum  censor  strategy  for  an 
experiment  is  the  censor  locations,  which  are  bounded  by  earliest  censor  of 
only  one  realization,  and  the  latest  censor  would  be  after  the  last  realization 
of  the  total  N  on  test  (the  trivial  case).  If  the  available  time  of  the 
experiment  is  insufficient  to  produce  the  required  number  of  realizations  of 
the  random  variable,  or  the  test  equipment  is  limited  in  capacity,  then  the 
total  number  of  samples  put  on  test  must  be  increased  by  implementing 
censoring.  The  determination  of  the  optimum  censor  location  is  made 
through  the  application  of  Information  Theory. 
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B.     USING  CENSORING  TO  OPTIMIZE  INFORMATION 

A  univariate  objective  function  for  maximizing  the  knowledge  of  the 
parameters  is  obtained  through  the  application  of  information  theory.  As 
stated  in  Chapter  2,  the  information  /  is  a  scalar  measure  of  how  well  the 
parameters  are  known.    For  convenience,  Eq.  2.5  is  restated  as  follows 

/  =  |p(6)  logp(6)  d6  (4-2) 

where  P{6|D}  is  the  probability  of  the  parameter,  given  a  set  of  data. 
For  large  sample  sizes,  say  N  >  1000,  the  probability  density  of  the 
parameters  is  asymptotically  normal  by  the  Central  Limit  Theorem 
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The  typical  sample  sizes  in  engineering  are  N  <  100,  which  does  not 
produce  a  normal  distribution  of  the  parameters.  As  demonstrated  by 
simulation,  the  marginal  distributions  of  the  shape  parameters  in  Figures  4.1 
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through  4.3,  the  distributions  are  clearly  not  normal.  The  result  is  that  an 
analytical  form  of  the  information  /,  such  as  that  given  in  Eq  4.4,  cannot  be 
utilized.  For  small  data  sets,  the  probability  density  function  of  the 
parameter  is  the  normalized  likelihood  function. 

The  likelihood  function  is  useful  because  it  contains  the  internal 
parameters  of  n,  N,  a,  and  p  and  has  the  advantage  that  it  is  not  distribution 
specific.  In  developing  an  optimum  censor  strategy,  the  objective  is  to  find 
the  censor  location  xc  so  that  multiple  repeated  tests  could  be  performed  to 
maximize  the  certainty  of  the  parameters,  within  the  constraints  of  time  and 
number  of  test  stations.  A  high  level  of  certainty  in  the  parameters  is 
characterized  by  a  spike  in  the  likelihood  surface.  On  the  contrary,  a  low 
level  of  certainty  in  the  parameters  would  be  represented  by  a  more  diffuse 
likelihood  surface.  The  advantage  of  computing  the  information  using  the 
likelihood  function  is  that  it  is  a  method  of  numerically  distilling  the  many 
shapes  of  the  likelihood  surface  into  a  single  parameter. 

The  many  shapes  of  the  likelihood  surface  may  be  determined 
analytically  by  computing  the  0th  moment,  1st  moment,  2nd  moment,  and 
Kurtosis  (mathematics  of  higher  order  moments).  The  drawback  of  using 
these  methods  is  that  the  objective  function  would  become  multivariate  due 
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to  the  number  of  moments  which  would  be  required  to  characterize  the  shape 
and  the  confidence  on  the  estimator  of  the  various  moments  cannot  be 
defined. 

The  optimum  censor  foundation  is  established  by  simulation  as 
previously  illustrated  in  Figure  3.2.  The  simulation  is  conducted  as  if  the 
model  and  parameters  are  known.  The  simulated  data  set  {xj  is  computed 
based  on  the  expected  rank  of  the  ith  sample.   The  expected  rank  is  defined 


as 


F  =  —  ,  i  =  1,  2,  ...,  N  (4.5) 

JV+1 


The  primary  difference  in  this  simulation  for  the  calculation  of  information 
and  previous  simulations  is  that  the  data  set  {X}  was  not  computed  based  on 
a  random  ranking  as  occurrence  in  an  experiment.  Instead,  the  expected 
values  of  the  data  were  computed  from  the  expected  rank.  The  reason  for 
using  the  expected  values  is  that  the  interest  in  this  study  is  to  determine  the 
various  trends  of  information  for  data  sets  possessing  different  amounts  of 
scatter  and  the  impact  of  censoring  on  the  information.  The  use  of  random 
ranking  would  result  in  a  large  number  of  simulations  required  to  be  run 
before  the  desired  trend  would  be  clearly  distinguishable. 
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Information  versus  Scheduled  Censor  Location  for  various  a's 
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FIGURE  4.4   EFFECT  OF  a  ON  INFORMATION  IN  A  SCHEDULE 
CENSORED  EXPERIMENT 

Once  the  simulated  data  set  is  obtained,  the  number  of  realizations,  n, 
is  determined  based  on  the  number  of  x/s  which  have  values  less  than  the 
censor  value  xc.  The  information  /  is  then  computed  based  on  the  n  realized 
data  and  N-n  censored  data.  The  repetitive  computation  of  /  for  various 
censor  strategies  and  different  sets  of  data  makes  it  possible  to  determine  the 
best  xc  for  increasing  /,  and  to  evaluate  when  a  point  of  diminishing  return 
is  reached  when  trying  to  improve  /. 

The  simulations  conducted  in  this  study  are  based  on  a  life  test 
experiment  on  sets  of  data  described  by  three  shape  parameters:  a  =  0.2,  1, 
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and  5  for  the  reasons  discussed  in  Chapter  III.  The  impact  of  both  scheduled 
censoring  and  censoring  by  the  number  of  realizations  was  investigated.  The 
algorithms  and  software  used  in  this  simulation  are  provided  in  Appendix  D. 
The  graph  shown  in  Figure  4.4  is  a  comparison  of  the  effects  of  the 
underlying  value  of  a  for  the  data  on  the  information  /  for  scheduled 
censoring.  The  time  of  censor  axis  is  normalized  by  the  underlying  location 
parameter  p  so  that  the  time  of  the  censor  relative  to  the  mean  life  is  evident. 
The  curve  representing  a=0.2,  or  data  with  a  large  amount  of  scatter,  appears 
nearly  flat.  The  only  dramatic  changes  which  occur  for  this  curve  are  in  the 
region  of  very  early  censor  times.  Therefore,  the  small  amount  of  increase 
in  the  information  of  the  parameters  which  results  from  allowing  the 
experiment  to  continue  without  censoring  is  not  justifiable.  The  increase  in 
/  for  a=l  occurs  rather  quickly  up  to  the  mean  life  and  then  diminishes. 
This  is  due  to  the  increased  grouping  of  the  data  and  the  observation  that 
fewer  failures  occur  early,  resulting  in  little  information  being  gained  for 
early  censor  times.  The  change  in  information  is  even  more  dramatic  for  the 
case  where  a=5.  The  data  is  more  grouped  about  the  mean,  so  that  the 
information  gain  occurs  only  in  this  area. 
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FIGURE  4.5   EFFECT  OF  N  ON  INFORMATION  IN  A  SCHEDULE 
CENSORED  EXPERIMENT  FOR  cx=0.2 

Figure  4.5  is  a  graph  of  information  versus  the  scheduled  censor  times 
for  various  sizes  of  data  sets  for  a=0.2.  The  purpose  of  this  graph  is  to 
evaluate  the  impact  of  increasing  N  on  the  information.  As  shown  in  the 
graph,  the  increase  in  /  is  dramatic  between  N=10  and  N=100  samples. 
Note,  however,  that  the  increase  in  /  for  an  increase  from  N=100  to  N=300 
samples  is  significantly  less.  The  important  implication  of  this  graph  is  that 
there  exists  a  point  of  diminishing  return  in  trying  to  increase  /  by  obtaining 
more  data,  and  that  the  amount  of  increase  is  quantifiable.  Similar  results 
were  obtained  for  a=l  and  a=5,  as  shown  in  Figures  4.6  and  4.7. 
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Information  versus  Scheduled  Censor  Location  (  a  =  1) 
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FIGURE  4.6   EFFECT  OF  N  ON  INFORMATION  IN  A  SCHEDULE 
CENSORED  EXPERIMENT  FOR  cx=l 
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FIGURE  4.7   EFFECT  OF  N  ON  INFORMATION  IN  A  SCHEDULE 
CENSORED  EXPERIMENT  FOR  a=5 
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The  effect  of  censoring  a  experiment  based  on  the  number  of  realized 
data  on  the  information  is  shown  in  Figure  4.8  for  various  underlying  shape 
parameters.  The  abscissa  is  labelled  the  number  of  fractional  realization 
because  the  number  of  realizations  has  been  normalized  by  the  number  of 
samples  put  on  test.  The  value  of  0.1  for  the  fractional  number  of 
realizations  means  that  10%  of  the  samples  tested  have  failed.  The  plot  of 
the  curve  for  a=0.2  appears  initially  flat  and  then  begins  to  increase  at  a 
fairly  constant  rate.  This  is  due  to  the  long,  tunnel  like  shape  of  the 
likelihood  surface  for  small  number  of  data  points  when  a  <  1,  as  was 
shown  in  Figure  4.2.        The  implication  of  this  shape  was  previously 
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Information  versus  Number  of  Realizations  (a  =  0.2) 
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FIGURE  4.9   EFFECT  OF  N  ON  INFORMATION  FOR  AN 
EXPERIMENT  CENSORED  BY  THE  NUMBER  OF  REALIZATIONS 
FOR  a=0.2 

disscussed  and  is  caused  by  the  large  number  of  location  parameters  which 
would  be  likely  to  fit  the  data.  Due  to  constraints  imposed  on  the  numerical 
algorithm  to  integrate  the  volume  of  the  likelihood  surface,  the  upper  limit 
for  integration  in  p  was  approximately  20  times  the  underlying  value.  The 
curve  is  flat  in  the  region  of  fractional  realizations  less  than  0.2  because  the 
slight  changes  which  occur  in  the  shape  of  the  surface  take  place  past  the 
maximum  P  integration  limit,  and  are  therefore  not  detectable.  The 
justification  for  establishing  this  upper  limit  in  p  is  that  changes  which  occur 
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in  the  likelihood  surface  past  the  point  of  20  times  the  mean  life  of  an  object 
are  of  little  practical  use. 

The  curves  representing  oc=l  and  oc=5  in  Figure  4.8  begin  at  lower 
values  of  /  and  increase  faster  than  for  a=0.2.  The  reason  the  information 
is  initially  higher  for  small  values  of  a  is  that  the  curves  representing  the 
probability  density  f(x)  for  the  Weibull  model  are  skewed  toward  the  region 
of  early  failures.  As  the  value  of  a  gets  larger,  the  density  becomes 
narrower  (less  scatter)  and  less  skewed  toward  the  early  failures.  Therefore, 
early  failures  which  occur  when  a  is  large  do  not  provide  much  information 
about  the  underlying  shape  parameter. 

Figures  4.9  through  4.11  are  graphs  which  show  the  influence  of  the 
total  N  samples  put  on  test  on  the  information  for  the  three  a's  of  concern. 
In  each  figure,  the  concept  of  a  point  of  diminishing  return  in  improving  I 
by  more  data  is  validated. 

1.      Optimization  of  Information  in  a  Time  Limited  Experiment 

An  illustration  of  how  recursive  censoring  can  be  used  to  improve 
the  information  of  the  parameters  is  provided  in  Figure  4.12.  This  figure  is 
a  plot  of  I  versus  the  number  of  fractional  realizations  for  a=0.2  for  N=10, 
50,  and  100  total  samples  tested.    Note  that  on  the  top  of  the  graph  an 
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FIGURE  4.10   EFFECT  OF  N  ON  INFORMATION  FOR  AN 
EXPERIMENT  CENSORED  BY  THE  NUMBER  OF  REALIZATIONS 
FOR  a=l 


20- 


Information  versus  Number  of  Realizations  (a  =  5) 


N=100 


N=50 


N=10 


i  i  i  i  |  i  i  i  i  | i i i i | i i i i | i i i i  |  i i  i  '  |  i  '  '  '  |  '  '  '  '  i  '  '  'I  i  '  '  '  ' 

0        0.1      0.2      0.3      0.4      0.5      0.6      0.7      0.8      0.9        1 

Fractional  number  of  realized  data 


FIGURE  4.11    EFFECT  OF  N  ON  INFORMATION  FOR  AN 
EXPERIMENT  CENSORED  BY  THE  NUMBER  OF  REALIZATIONS 
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additional  axis  of  scheduled  censor  times  is  plotted  which  is  used  to  indicate 
the  fractional  number  of  samples  realized  within  the  specified  time  prior  to 
censoring.  The  scheduled  censor  time  of  0.1  of  the  mean  life  corresponds 
to  the  point  of  approximately  46%  of  the  samples  realized  on  the  fractional 
number  of  realizations  axis.  An  example  of  the  impact  of  an  early  censor 
on  the  information  of  the  parameters  is  shown  on  the  figure.  In  this 
example,  suppose  that  two  sets  of  life  tests  of  50  samples  were  initiated  at 
the  same  point  in  time.  One  test  was  allowed  to  progress  through  the 
duration  of  time  up  to  four  times  the  mean  life,  without  censoring.  The 
other  test  are  censored  at  10%  of  the  mean  life,  and  immediately  50  more 
samples  were  put  on  test.  The  point  of  the  scheduled  censor  at  0.1 
establishes  the  baseline  to  be  used  in  the  comparison  of  the  information 
between  the  two  tests. 

The  quantity  labeled  Ix  is  the  increase  in  information  which  results 
from  the  test  proceeding  to  0.2  of  the  mean  life,  without  censoring.  As 
shown  on  the  graph,  the  amount  of  increase  is  extremely  small.  For  the  test 
which  was  censored,  however,  a  total  of  100  samples  were  put  on  test  for  a 
period  of  time  of  0.1  of  the  mean  life.  The  increase  in  information  is 
labelled  as  U  and  represents  a  drastic  increase  in  the  information.  Even  more 
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Improvement  of  Information  through  Recursive  Censoring 
Large  Increase  in  Information  from  recursive  censor  for  a  =  0.2 
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FIGURE  4.12   INCREASING  INFORMATION  THROUGH  EARLY 
CENSORING  IN  AN  EXPERIMENT 
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important  is  that  the  U  is  a  larger  increase  in  information  than  would  have 
occurred  if  the  test  without  censoring  was  allowed  to  continue  up  to  four 
times  the  mean  life.  The  significance  of  this  is  that  more  information  about 
the  parameters  was  obtained  in  a  fraction  of  the  time  it  would  take  if  no 
censoring  was  utilized. 

2.      Optimization  of  Information  in  a  Capacity  Limited  Experiment 

Information  theory  can  also  be  applied  to  an  experiment  which  is 
limited  by  the  capacity  of  the  test  equipment.  Consider  an  example  in  which 
the  goal  of  a  particular  experiment  is  to  predict  the  reliability  of  a  large  flat 
plate,  such  as  the  decking  on  a  ship.  The  problem  exists  that  the  test 
equipment  available  only  produces  40%  of  the  expected  mean  failure  load 
of  the  plate.  Resolution  is  required  for  the  question  of  whether  the 
appropriate  parameters  can  be  obtained  by  testing  the  full  size  plate.  The 
issue  is  resolved  through  the  application  of  information  theory.  The  goal  is 
to  use  information  theory  to  determine  if  an  experiment  will  produce  better 
certainty  of  the  parameters  by  testing  smaller  scaled  down  sizes  of  the  plate 
or  simply  more  tests  of  the  big  plate. 

The  failure  process  of  the  plate  is  modeled  by  dividing  it  up  into 
many  smaller,  equal-width  strips  of  plate  in  parallel.    If  the  failure  process 
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*(/>.)  =exp{-(Jy}  (4.6) 

Pi 

is  characterized  by  fracture  mechanics,  the  entire  plate  will  fail  when  a  crack 

is    initiated    at    the    weakest    strip    and    propagates    through    the    plate 

catastrophically.   The  Weibull  model  is  selected  for  this  problem  because  it 

is  the  presence  of  a  extreme  value  which  results  in  the  failure.    Reliability 

for  an  individual   strip  of  the  plate  is  given  as 

where:   p;  is  the  random  variable  of  load  for  the  ith  strip 
pj   is  the  mean  load  of  failure  for  each  strip 
a   is  the  shape  parameter  of  the  failure  model 
R(Pi  )  is  the  reliability  for  the  value  of  the  random  variable. 

Since  the  process  is  weakest  link  (or  one  which  is  serial  in  the  failure 

mechanism)  the  reliability  of  the  plate  is  the  product  of  the  reliabilities  of  the 

individual  strips  in  the  plate  as  in 

R  =  Rip,)  R(p2)  ...  R(pk)  (4.7) 

Three  different  sizes  of  the  plate  are  proposed  to  be  tested  utilizing  the 
maximum  load  of  the  test  machine.  The  largest  size  tested  is  chosen  to  be 
the  full  size  plate,  because  no  size  effect  considerations  are  required.  The 
two  smaller  sizes,  one  representing  a  moderate  reduction  in  the  size  of  the 
plate  (medium  plate),  and  the  other  representing  a  large  reduction  in  size 
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(small  plate),  are  determined  from  size  effect  considerations  and  the  desired 
number  of  failures  in  the  experiment. 

The  three  proposed  plates  to  be  tested  are  shown  in  Figure  4.13.  The 
values  of  kL,  kM,  and  ks  represent  the  number  of  parallel  strips  in  the  plates; 
a  unitless  metric  of  the  plate  width  since  all  strips  are  assumed  to  have  both 


(a)  Small  Plate  Model  -  kg  strips 


(b)  Medium  Plate  Model  -  k^  strips 
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FIGURE  4.13   FAILURE  MODELS  FOR  A  PLATE  IN  TENSION 
equal  width  and  length.   The  reliability  of  the  entire  large  plate  is  obtained 
from  Eq  4.7  by  taking  into  account  equal  loading  for  each  strip  and  the 


73 


presence  of  kL  strips 
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Similar  expressions  are  obtained  for  the  reliability  of  the  medium  and  small 
plate.  The  intention  is  to  select  the  proper  sizes  of  the  medium  and  small 
plate  such  that  the  reliability  of  these  smaller  plates  is  equal  to  the  reliability 
of  the  large  plate.  This  makes  it  possible  to  predict  the  reliability  of  the 
large  plate  by  testing  one  which  is  smaller.  In  equation  form,  this  is 
represented  by  equating  the  respective  reliabilities:  RL  =  RM  =  Rs.  These 
equations  are  then  simplified  in  terms  of  the  applied  load  p,  and  the  number 
of  strips  in  the  test  sample  k.    The  resulting  equations  are  written  as 


Pm  = 

(M 

Pl 

k 

Pi  = 

( k  Y 

Pl 

<ks* 

(4.9) 


(4.10) 
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Changes  in  the  sizes  of  the  plate  do  not  affect  the  intrinsic  scatter  in  the 
failure  data  because  the  failure  is  governed  by  the  weakest  link.  Therefore, 
the  values  of  shape  parameters  of  the  three  plates  are  assumed  equal:  ocL  = 
aM  =as.  The  difference  in  the  sizes  affects  the  mean  value  of  the  strength 
since  there  is  a  greater  likelihood  of  possessing  a  very  weak  strip  when  the 
number  of  strips  in  the  plate  is  large.  Fortunately,  the  increase  in  the 
number  of  strips  in  the  structure  reduces  the  individual  loads  on  each  strip 
because  the  load  is  more  distributed. 

Two  different  load  levels,  pM/P]  and  ps/(3,,  were  established  for  the 
medium  and  small  plates  which  would  produce  failures  in  60%  and  90%  of 
the  samples  tested,  respectively.  These  percentages  were  selected  in  order 
to  determine  the  influence  of  the  numbers  of  failures  for  the  smaller  plates 
on  the  information  of  the  parameter  a.  The  loads  which  correspond  to  these 
probabilities  of  failures  F(p/p,)  are  determined  from  the  CDF  of  the  failure 
model  which  is  computed  using  the  expected  values  of  the  parameters.  In 
many  applications  in  engineering,  some  knowledge  is  possessed  about  the 
expected  shape  and  location  parameters  of  the  failure  model.  It  is  assumed 
in  this  example  that  the  underlying  a  =  5,  p  =  1.  A  plot  of  the  resulting 
CDF  is  provided  in  Figure  4.14.    Based  on  this  curve,  the  capacity  limited 
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load  placed  on  each  strip  in  the  large  plate  is  pL/(3  =  0.4,  which  corresponds 
to  failure  of  2%  of  the  samples  tested.  The  size  of  medium  plate  is  selected 
to  produce  60%  of  the  plates  tested  to  fail,  or  pM/p  =  0.97  from  Figure  4.14. 
The  small  plate  is  to  be  sized  such  that  90%  of  the  plates  tested  will  fail, 
resulting  in  ps/p  =  1 .2.  The  purpose  of  choosing  those  particular  loads  is  to 
determine  which  plate  should  be  tested  so  that  the  most  information  about 
the  parameters  is  obtained  from  the  experiment. 

Determination  of  the  sizes  of  the  medium  and  small  plates  relative  to 
the  large  plate  is  obtained  by  substituting  the  corresponding  loads  discussed 
above  in  Eqs  4.9  and  4.10  for  a=5.  The  result  is  that  kL  is  83.86  times 
larger  than  kM,  and  243  times  larger  than  ks. 

The  graph  of  the  Information  /  versus  the  number  of  fractional 
realizations  for  a  process  with  an  underlying  value  a=5  is  provided  in  Figure 
4.15.  In  the  case  of  the  large  plate,  the  probability  of  failure  is  so  low  at  the 
load  at  maximum  capacity  that  the  number  of  plates  tested  was  chosen  to  be 
large,  or  N=100.  The  information  resulting  from  testing  100  large  plates  is 
labelled  by  point  'A'  in  Figure  4.15.  If  the  medium  plates  are  tested  a 
higher  percentage  will  fail,  so  only  N=50  are  tested.  This  increases  the 
information  dramatically,  as  seen  by  the  point  labeled  'B.'       If  10  of  the 
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FIGURE  4.14   UNDERLYING  CDF  FOR  THE  PLATE  FAILURE 
EXAMPLE 

small  plates  are  tested,  almost  all  samples  will  fail,  producing  the 
information  labeled  'C  in  Figure  4.15.  If  50  of  the  small  plates  are  tested, 
the  information  is  increased  further  to  point  'D.' 

The  purpose  of  performing  this  capacity  limited  experiment  is  to 
determine  the  parameters  of  the  failure  model  of  the  large  plate  so  that  the 
reliability  at  various  stresses  may  be  quantified.  Testing  the  large  plate, 
however,  only  produces  a  few  failures  due  to  the  limited  load.  The  result  is 
a  very  small  amount  of  information  /  on  a.  The  size  of  the  plate  can  be 
reduced  to  a  size  which  is  sufficiently  small  such  that  all  samples  which  are 
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Minimizing  the  Loss  of  Information  in  a  Capacity 
Limited  Experiment 


Information  versus  Number  of  Realizations  (  a  =  5  ) 
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FIGURE  4.15   MINIMIZING  THE  LOSS  OF  INFORMATION  IN  A 
CAPACITY  LIMITED  EXPERIMENT 
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tested  will  fail.  The  small  size  of  the  samples  makes  it  possible  to  test  a 
large  number  of  plates  to  failure,  but  the  gain  of  information  on  a  diminishes 
once  a  sufficient  amount  of  data  has  been  collected.  The  number  of  samples 
tested  for  the  small  size  can  be  optimized  to  maximize  the  information  on 
a. 

The  uncertainty,  however,  is  reintroduced  from  the  size  effect 
when  the  information  on  a  is  transformed  from  the  small  sample  back  to  the 
actual  size.  This  is  due  to  a  large  kL/ks  ratio  being  raised  to  the  exponent  of 
1/ot.  Any  uncertainty  in  the  a  will  produce  a  substantial  error  in  the 
predicted  location  parameter  of  the  large  object.  What  is  gained,  therefore, 
in  the  knowledge  of  a  by  testing  more  small  samples  may  be  lost  in  the 
transition  to  the  actual  sample  because  of  the  large  size  effect. 

Increasing  the  size  of  the  sample  tested  will  reduce  the  impact  of 
uncertainty  of  a,  but  the  information  of  a  may  be  lower  than  that  obtained 
from  testing  many  of  the  very  small  samples.  The  objective  then  is  to 
minimize  the  loss  of  information  by  increasing  the  size  of  the  sample  in 
order  to  reduce  the  impact  of  uncertainty  in  a  on  a  large  scaling  effect  ratio. 
In  this  regard,  point  'B'  in  Figure  4.15  would  be  the  desired  target  point  for 
the  design  of  the  experiment. 
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V.     CONCLUSIONS 

The  characterization  of  reliability  requires  a  failure  model  based  on  the 
physics  of  the  failure  process,  and  parameters  based  on  experimental  data. 
The  limitations  of  experiments  in  terms  of  time  or  capacity  give  rise  to  the 
need  for  an  experiment  designed  to  maximize  the  knowledge  of  the 
parameters  within  the  constraints  of  the  limitations.  The  optimality 
parameter  used  was  the  information  /  of  the  parameters  which  distills  the 
multitude  of  shapes  of  the  likelihood  surfaces  into  a  single  parameter. 

This  study  has  demonstrated  that  for  high  variability  data,  such  as  life 
or  fatigue  data,  the  information  obtained  from  experiments  can  be 
dramatically  increased  by  recursive  censoring.  In  cases  of  low  variability 
data,  such  as  data  resulting  from  experiments  limited  in  capacity,  information 
can  be  used  to  determine  the  maximum  structural  dimension  which  would 
result  in  meaningful  estimation  of  the  parameters. 

Due  to  the  large  number  of  possible  shapes  in  the  likelihood  surface,  an 
integration  scheme  which  accommodates  an  adaptive  mesh  requires 
development.   This  would  provide  the  ability  to  compute  information  for  a 
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large  range  of  parameters  without  first  having  to  define  the  bounds  of  the 
likelihood  surface  for  every  data  set. 

The  application  of  information  theory  developed  in  this  study  should  be 
extended  to  cover  a  range  of  parameters  describing  the  randomness  of  data 
and  censoring  strategies  in  order  to  develop  a  set  of  optimality  curves  which 
could  be  used  in  interactive  planning  of  an  experiment  design. 
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APPENDIX  A:    RANK  AND  ORDER  IN  A  DATA  SET 

The  order  of  a  data  set  is  the  arrangement  of  the  data  from  the  smallest 
magnitude  of  the  realized  random  variable  to  the  largest.  An  ordered  data 
set  satisfies  the  following  inequalities  [Bury,  9] 

*i  *  X2  *  '  '  '  *  xn  (AJ) 

The  rank  of  a  particular  sample  value  \  with  respect  to  the  other  values 
within  the  data  set  is  determined  by  its  relative  magnitude.  The  smallest 
value,  or  the  first  value  in  the  ordered  data  set,  would  have  the  lowest 
ranking,  and  conversely,  the  largest  value  would  have  the  highest  ranking. 
There  are  numerous  ranking  methods  available  which  provide  discrete  values 
of  the  CDF  based  on  the  ranking  of  each  sample  in  the  data  set.  One  of  the 
more  widely  accepted  methods  of  determining  F  is  that  of  expected  rank 

F(xt)  =  -i-  i  =  1,  2,  ...  ,  N  (A.2) 

N+l 

where  F(Xj)  is  the  probability  that  a  value  at  least  as  large  as  x{  will  occur  in 
a  data  set  {X}  of  N  samples.     Since  the  value  F(Xj)  can  be  determined 
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without  the  selection  of  a  model,  the  expected  rank  of  an  ordered  data  set 
represents  a  viable  nonparametric  method  in  computing  reliability. 

For  example,  a  fiber  having  a  rank  of  0.25  is  at  least  as  strong  as  25% 
of  the  fibers  tested,  which  is  the  same  as  saying  that  the  probability  failure 
F=0.25.  However,  it  may  subsequently  be  determined  from  additional  testing 
that  the  particular  fiber  is  only  as  strong  as  20%  of  the  fibers  tested,  or 
F=0.2.  Since  all  fibers  cannot  be  tested,  the  true  rank  of  the  fiber  will  never 
be  known,  but  the  rank  serves  as  a  useful  way  of  estimating  the  value  of  F 
so  that  the  data  may  be  plotted. 
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APPENDIX  B:    LIKELIHOOD  AND  MAXIMUM  LIKELIHOOD 

ESTIMATORS 


The  likelihood  function  is  defined  [Bury,  9]  as 

n 

um  =  n  M;e>  (BJ) 

where:    L(X,9)  is  the  likelihood  of  the  parameter(s)  0  given  the  data    set 
X 
f  (Xj,6)  is  the  pd  of  the  i,h  realization  of  x  for  the 

parameter(s)  9 
n  is  the  number  of  samples  in  the  data  set. 

The  symbol  6  used  in  the  equation  is  a  vector  which  represents  both  the 

shape  and  location  parameters  a  and  p  used  in  the  Weibull  distribution. 

Recall  that  the  pdf  is  the  derivative  of  the  CDF  with  respect  to  the  random 

variable,  which  is  given  as  follows  for  the  Weibull  distribution 


/(jc;o,p)  =  |  A"'1  exp[-(^)»]  (B-2) 


The  likelihood  function  is  a  mathematical  representation  of  the 
probability  that  a  selected  pair  of  parameters  describe  a  given  data  set.  A 
large  value  of  likelihood  results  when  the  selected  parameters  are  more  likely 
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to  be  the  underlying  parameters.  This  is  due  to  the  product  of  the  probability 
densities.  If  the  selected  parameters  are  far  from  the  underlying  values,  the 
resulting  probability  densities  will  be  very  small  numbers,  and  the  product 
of  these  numbers  will  be  an  even  smaller  number.  As  the  selected 
parameters  are  in  the  neighborhood  the  underlying  values,  the  probability 
densities  will  become  larger,  and  correspondingly,  the  likelihood  will  become 
larger.  The  point  at  which  the  likelihood  is  the  maximum  represents  the 
most  likely  parameters  which  describe  the  given  data  set,  and  are  referred  to 
as  the  maximum  likelihood  estimators  (MLE).  The  MLE  parameters  can  be 
determined  by  calculus  by  taking  the  derivative  of  likelihood  function  of 
equation  (B.2)  with  respect  to  the  parameters  6.  The  resulting  MLE 
parameter  for  the  Weibull  distribution  are  given  as 


a 


E  V  ^ 


i=l 


1 

n  I=1 


(B.3) 


P  - 


1     - 

-  E  x, 


n   M 


(B.4) 
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Alternatively,  the  values  of  likelihood  for  a  model  consisting  of 
two  parameters,  such  as  in  the  Weibull  model,  can  be  represented  as  a  three- 
dimensional  surface  plot  or  two-dimensional  contour  plot.  The  likelihood 
surfaces  and  contour  plots  for  the  data  sets  represented  by  curves  in  Figure 
3.3  are  given  in  Figure  B.l.  For  the  cases  where  the  likelihood  surface  is 
unimodal,  the  MLE  parameters  (determined  by  Eqs.  (B.3)  and  (B.4))  are  in 
accord  with  the  peaks  on  the  likelihood  contours. 

The  estimators  determined  either  by  the  likelihood  contour  or  by 
MLE  are  based  on  the  given  data  set.  Due  to  the  randomness  associated 
with  each  set  of  data,  different  estimators  are  likely  to  be  obtained  from 
different  sample  sets,   even  from  the  same  population. 
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FIGURE  B.l    LIKELIHOOD  SURFACES  AND  CONTOUR  PLOTS 
FOR  TWO  DATA  SETS 
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APPENDIX  C:    LOWER  TAIL  SUBTLETIES  IN  DATA 

As  an  illustration  of  the  many  subtleties  in  the  lower  tails  of  the 
distributions  encountered  in  random  data  sets,  four  representative  F*  curves 
are  provided  in  Figure  CI.  These  plots  were  obtained  from  randomly 
generated  sets  of  data  of  100  samples.  The  underlying  distributions  which 
are  straight  lines  are  plotted  as  a  guide  for  comparison.  Notice  that  in  some 
cases  the  points  fall  above  the  line,  indicating  data  which  fail  earlier  than  the 
model,  while  in  other  cases  the  points  fall  below  the  lower  portion  of  the 
curve,  indicating  that  the  samples  later  than  the  model.  There  will,  of 
course,  be  cases  in  which  the  data  closely  resembles  the  underlying 
distribution.  The  utility  of  displaying  the  distributions  in  the  F*  domain  is 
clearly  evident  in  these  four  graphs. 
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APPENDIX  D:    SIMULATION  SOFTWARE 
I.     SOFTWARE  DESCRIPTION 

The  purpose  of  this  appendix  is  to  provide  a  description  of  the 
simulation  procedures  and  the  software.  The  rationale  behind  each 
simulation  procedure  implemented  will  be  addressed  in  this  appendix, 
followed  by  the  software  used  to  implement  that  particular  component.  The 
software  programming  used  in  the  simulation  was  developed  using  the 
MATLAB  program  by  The  MathWorks,  Inc. 

A.      SIMULATION  USING  A  NONPARAMETRIC  METHOD  TO 
CHARACTERIZE  RELIABILITY 

This  simulation  was  an  investigation  into  whether  practical  levels  of 
reliability  (i.e.,  0.9999  or  0.99999)  could  be  achieved  without  the  need  of  a 
model.  The  advantage  of  using  a  nonparametric  method  is  that  reliability 
can  be  quantified  based  on  the  data  directly,  without  the  need  of  fitting  a 
particular  distribution  to  the  data.  The  issue  to  be  determined  from  the 
simulation  is  whether  or  not  using  a  nonparametric  method  is  pragmatic  in 
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the  amount  of  data  required  and  the  associated  cost  and  time  to  obtain  the 
data. 

The  quantity  which  is  random  in  any  experiment  is  the  probability  of 
occurrence  of  a  random  variable.  In  terms  of  composite  reliability,  it  is  the 
probability  of  failure  in  the  random  variables  of  strength  and  life  which  is 
random.  This  random  probability  of  failure  can  be  simulated  by  setting  it 
equal  to  a  random  number,  defined  as  having  a  value  between  zero  and  one, 
generated  by  a  computer  program. 

The  data  which  correspond  to  those  probabilities  of  failure  can  be 
computed  using  an  empirical  rank  of  the  data.  The  concepts  of  order  and 
rank  of  data  are  discussed  in  Appendix  A. 

1.       Simulation  Description 

A  set  of  numbers  representing  the  probability  of  failure  FC^)  of  N 
samples  is  generated  as  a  set  of  random  numbers  {F}.  The  data  set  {X} 
cooresponding  to  those  values  of  {F}  are  computed  using  the  Weibull 
distribution  with  known  shape  parameter  a  and  location  parameter  p 


91 


xt  =  P  (-ln(l-F(x()))«  (D1) 


The  data  set  {X}  represents  a  set  of  realized  random  variables  such  as 
strength  or  life,  which  have  underlying  values  of  the  parameters  a  and  p 
that  describe  the  data  set.  In  other  words,  {X}  simulates  an  actual  set  of 
data  that  would  have  been  obtained  by  testing  a  structural  sample,  fiber  or 
composite,  to  failure.  The  difference  in  the  simulated  data  set  {X}  and  a 
data  set  resulting  from  an  actual  test  is  that  the  underlying  parameters  are 
known  for  {X}. 

The  data  set  {X}  is  then  ordered  according  to  Eq  (A.l)  and  the  values 
of  FCx,)  are  computed  using  expected  rank  of  Eq  (A. 2).  In  addition,  the  true 
rank  of  each  sample,  which  is  the  random  number  used  to  compute  x{  is  used 
for  the  purpose  as  a  point  for  comparison  with  the  expected  rank.  It  must 
be  reiterated  that  the  true  rank  of  any  data  point  will  almost  never  be 
absolutely  known. 

1.         Simulation  Software 

%  MATLAB  program  to  compare  nonparametric  method  of  describing  a  set  of 

%  data  using  expected  rank  with  the  known  or  underlying  distribution  of 

%  the  data  set 

% 

%  Developed  by:    LT  James  W.  Coleman,  USN 

% 
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%    Prompt  user  for  input  of  the  number  of  samples  in  the  data  set,  the 

%    number  of  simulated  data  sets  desired,  and  the  underlying  parameters 

%    of  the  Weibull  distribution  used  to  form  the  data  set. 

% 

n=inputC Enter  number  of  samples  in  the  data  set  '); 

iter=input('Enter  number  of  simulations  desired  '); 

alpha  l=input('Enter  underlying  value  of  shape  parameter  alpha  '); 

beta l=input(' Enter  underlying  value  of  location  parameter  beta  '); 

% 

%    Compute  the  expected  rank  for  each  of  the  samples,  and  transform  to 

%    the  ln-ln  space  F*  to  linearize  CDF.    Note  that  expected  rank  depends 

%    only  on  the  number  of  samples  in  the  data  set. 

% 

erank=[l:n]/(n+l); 

fstar=log(-log(  1  -erank)); 

% 

%    Perform  the  simulation  as  many  times  as  specified. 

% 

for  i=l  liter 

% 

%   Generate  a  set  of  random  numbers  and  assign  them  as  the  probability  of 

%    failure  F(xi).    Order  the  probabilities  F(xi)  and  transform  into  the 

%    F*  space 

% 

fxi=rand(l:n); 

fxi=sort(fxi); 

f  starx=log(-log(  1  -fxi)); 

% 

%    Solve  for  the  values  of  xi  (the  set  of  data)  which  correspond 

%    to  those  F(xi)'s,  given  the  shape  parameter  alpha  and  location 

%   parameter  beta.    The  values  of  alpha  and  beta  used  are  the  underlying 

%   parameters  and  the  Weibull  distribution  used  is  the  underlying  distribution. 

% 

xi=betal*(-log(l-fxi)).A(l/alphal); 

% 

%    Compute  the  expected  rank  for  each  of  the  samples,  and  transform  to 

%   the  ln-ln  space  F*  to  linearize  CDF 

% 

erank=[l:n]/(n+l); 

f  star=log(-log(  1  -erank)); 

% 

%   Plot  the  results  of  the  expected  rank  F  and  the  true  rank  values  F(xi) 

%   versus  the  log(xi)  in  the  transformed  F*  space 
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semilogx(xi,fstarx,'-w',xi,fstar,'ow') 

pause 

% 

°/c    If  the  simulated  data  set  displays  interesting  characteristics,  the  user 

%    may  save  it  for  later  analysis.    File  name  will  have  the  form  xi#.dat. 

qsave=input('Enter  1  to  save  this  data  set  xi,  0  to  continue'); 

if  qsave==l 

eval(['save  xi',num2str(iter),'.dat  xi  /ascii']) 

acknw=inputf  Record  parameters  used  for  xi  and  hit  Enter'); 

end 

end 


B.      SIMULATION  TO  DETERMINE  THE  DISTRIBUTIONS  OF 


ESTIMATORS 


1.      Simulation  Description 

In  this  simulation,  a  random  set  of  data  is  generated  using  the 
procedure  described  above,  and  the  maximum  likelihood  estimators  are 
computed  from  the  random  data  set.  The  algorithm  used  to  determine  the 
MLE's  incorporates  the  Golden  Section  Method  to  find  the  zero  of  a 
function  of  one  variable.   The  equation  of  a  is  given  as 


a  = 


1     " 
-  I  £  in,,. 

n  j=1 

n 

(D.2) 
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The  golden  section  method  is  used  to  find  the  zero  of  Eq  D.2;  the  result 
being  the  MLE  d.  Although  not  considered  an  efficient  means  of 
determining  the  zero  of  a  function,  the  golden  section  method  has  the 
advantage  of  guaranteed  convergence  within  a  computable  band  of  certainty. 
The  MLE  $  is  determined  from  the  value  of  a  by  the  following  equation 


p  = 


n  i=1 


*  (D.3) 


Once  the  MLE's  are  computed,  they  are  stored  in  column  vectors.  The 

process  of  generating  the  random  data  set  is  repeated  10,000  times  in  order 

to  produce  a  large  number  of  estimators  to  be  analyzed.  The  estimators  are 

then  sorted  using  a  separate  routine  so  that  the  results  can  be  viewed  in  the 

form  of  a  joint  histogram.   The  marginal  distributions  are  obtained  from  the 

joint  histogram  by  applying  Eq  3.7.     The  amount  of  uncertainty  in  the 

estimators  can  then  be  determined  from  the  resulting  distribution. 

2.     Simulation  Software 

%  MATLAB  program  which  computes  the  Maximum  Likelihood  Estimators  (MLE) 

%  of  a  simulated  random  data  set  xi  of  n  samples.   The  process  is  repeated 

%  10000  times  in  order  to  obtain  a  distribution  of  the  MLE's  corresponding 

%  sets  of  data  of  n  samples.   Each  simulated  data  set  has  known  values  of  the 

%  parameters  in  the  Weibull  distribution  used  to  construct  the  data  from 

%  random  numbers. 

% 

%  Developed  by:    LT  James  W.  Coleman,  USN 
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% 

%    Description  of  the  variables: 

%    max  -  Maximum  number  of  repetitions,  or  #  of  MLE's  computed 

%   count  -  count  number  of  repetitions 

%    n  -  #  of  samples  in  the  simulated  data  sets 

%    stora,storb  -  column  vectors  which  MLE's  alpha'  and  beta'  are  stored 

%    alpha  1, beta  1  -  underlying  values  of  the  parameters 

%    fxi  -  Probability  of  failure  CDF 

%    xi  -  Simulated  random  data  set  with  an  underlying  distribution 

%    xl,xu  -  Upper  and  lower  bounds  used  when  computing  MLE  alpha' 

%    fl,fu  -  Residual  from  MLE  equation  for  alpha  to  be  minimized 

%    niter  -  Number  of  iterations  needed  to  bracket  the  MLE  alpha' 

%    xl,x2  -  Intermediate  values  used  to  compute  MLE  alpha' 

%   fl,f2  -  Residuals  based  on  the  intermediate  values 

%   t  -  Golden  Section  Ratio 

ck    k  -  iterations  performed  within  the  golden  section  method 

%   fmin  -  Minimum  residual  computed  in  the  golden  section  method 

%    alphah  -  MLE  alpha' 

%    betah  -  MLE  beta' 

% 

%    Initialize  variables 

% 

max=  10000; 

count=l; 

n=10; 

stora=zeros(max,  1 ); 

storb=zeros(max,  1 ); 

% 

%    Prompt  user  for  input  of  number  of  samples  and  underlying  parameters. 

% 

n=input(' Enter  number  of  samples  in  the  data  set  '); 

betal=input('Enter  underlying  location  parameter  beta  '); 

alpha l=input('Enter  underlying  shape  parameter  alpha  '); 

% 

%    Begin  repitition  of  the  simulation 

% 

while  count<=max 

% 

%   Generate  a  random  data  set  xi  from  random  numbers  assigned  as  the 

%    CDF  values  F(xi) 

% 

fxi=rand(l:n); 

xi=betal  *(-log(  1  -fxi)).  A(  1/alpha  1 ); 
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% 

%    Solve  for  the  MLE's  using  the  Golden  Section  Method  to  minimize  a 
%   function  of  one  variable.    The  function  solva(alpha,xi)  computes  a  residual 
%    value  f  resulting  from  substituting  the  guessed  value  of  alpha  and  the 
%   data  set  xi  into  the  equation  of  ML  for  alpha.   The  residual,  therefore,  is 
%   the  difference  between  the  computed  value  and  the  most  likely  alpha.    This 
%    is  the  value  being  minimized. 

%   Compute  residual  for  the  upper  and  lower  bounds  xl  and  xu.    These  values 
%   are  set  at  a  factor  of  10  above  and  below  the  underlying  values. 
% 
xl=alphal/10; 
xu=alphal*10; 
fl=solva(xl,xi); 
fu=solva(xu,xi); 
%    Seventeen  iterations  are  required  for  interval  of  uncertainty  of  0.001 19 

niter=17; 
%    Determine  the  lower  intermediate  value  xl  of  alpha  based  on  a  weighting  by 
%    the  golden  section  ratio  t  applied  to  the  lower  bound.  Compute  residual 
%    with  this  value. 
t=0.381966; 
xl=(l-t)*xl+t*xu; 
fl=solva(xl,xi); 
%   Determine  the  upper  intermediate  value  of  alpha  based  on  a  weighting  by 
%    the  golden  section  ratio  t  applied  to  the  upper  bound.  Compute  residual 
%   for  this  value. 
x2=t*xl+(l-t)*xu; 
f2=solva(x2,xi); 
k=4; 

while  k<niter 
if  fl>f2 
%    The  residual  from  xl  is  larger  than  residual  from  x2;  make  the  old  xl 
%   the  new  lower  bound,  xl  (lower  intermediate  value)  becomes  the  old  x2, 
%   the  new  value  of  x2  is  computed  using  the  golden  section  ratio  on  the 
%   new  bounds.  Compute  residual  for  new  x2 
xl=xl; 
fl=fl; 
xl=x2; 
fl=f2; 

x2=t*xl+(l-t)*xu; 
f2=solva(x2,xi); 
k=k+l; 
else 
%    The  residual  from  x2  is  larger  than  residual  from  xl;  make  the  old  x2 
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%    the  new  upper  bound,  x2  (upper  intermediate  value)  becomes  the  old  xl, 
%    the  new  value  of  xl  is  computed  using  the  golden  section  ratio  on  the 
%    new  bounds.  Compute  residual  for  new  xl 
xu=x2; 
fu=f2; 
x2=xl; 
f2=fl; 

xl=(l-t)*xl+t*xu; 
fl=solva(xl,xi); 
k=k+l; 
end 
end 
%    Determine  which  of  the  residuals  is  the  lowest  and  the  value  of  alpha  which 
%   corresponds  to  that  residual. 
fmin=min([fl,fl,f2,fu]); 
if  fl==fmin,  alphah=xl;,  end; 
if  fl==fmin,  alphah=xl;,  end; 
if  f2==fmin,  alphah=x2;,  end; 
if  fu==fmin,  alphah=xu;,  end; 
% 

%    Compute  the  MLE  beta  from  using  the  MLE  alpha  and  store  both  estimators  in 
%   column  vectors  stora  and  storb 
% 
betah=(sum(xi.Aalphah)/n)A(l/alphah); 
stora(count,  1  )=alphah; 
storb(count.  1  )=betah; 
% 
count=count+l; 
end 
%    Save  the  results  for  later  analysis 
save  a_510.dat  stora  /ascii 
save  b_510.dat  storb  /ascii 


C.      SIMULATION  TO  COMPUTE  INFORMATION  IN  AN 
EXPERIMENT 
1.       Simulation  Description 
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The  simulation  developed  here  will  compute  the  information 
resulting  in  either  a  schedule  censored  or  number  of  data  censored 
experiment.  The  data  set  generated  in  the  simulation  is  the  expected  values 
of  the  realized  random  variable  x,  which  correspond  to  the  expected  rank  and 
the  underlying  values  of  the  parameters  specified  by  the  user.  The  user 
inputs  the  the  underlying  value  of  a,  the  number  of  samples  in  the 
experiment,  and  a  vector  containing  either  the  scheduled  censor  locations 
(fraction  of  mean)  or  the  censor  points  based  on  the  fractional  number  of 
realized  data.  The  program  computes  the  likelihood  of  the  parameters  for 
each  censored  data  set.  The  parameter  space  is  defined  from  0.25  to  4  times 
the  underlying  value  of  alpha,  and  0.05  to  150  times  the  underlying  value  of 
beta.  The  large  range  of  beta  is  required  when  the  underlying  alpha  is  less 
than  unity  and  only  a  small  number  of  points  are  realized. 

The  marginal  distributions  of  the  parameters  are  obtained  by 
projecting  the  likelihood  surface  on  the  respective  axes.  The  marginal 
distribution  of  alpha  is  then  integrated  and  normalized  by  the  resulting  area. 
The  information  is  computed  based  on  the  marginal  distribution  of  alpha  and 
stored  corresponding  to  that  particular  censor  location.     The  process  is 
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repeated  until  the  information  has  been  calculated  for  each  of  the  input 

censor  points. 

2.       Simulation  Software 

%    MATLAB  Program  to  compute  information  based  on  time  censored  data.    The  user 

%   definesthe  input  %values  for  the  underlying  alpha.  The  location  parameter  beta  is  % 

fixed  at  100  for  this  particular  grid.    The  vector  tcensor  defines  when  the  data  is    % 

censored   (fraction  of  the  mean).   The  numerical  values  of  information  are 

%    contained  in  the  vector  I. 

% 

%  Developed  by:    LT  James  W.  Coleman,  USN 

% 

%   Define  underlying  parameters  and  range  of  alphas 

% 

alpha  l=input( 'enter  underlying  value  of  alpha    ') 

alphamax=4*alpha  1 ; 

alphamin=alphal/4; 

astep=(alphamax-alphamin)/50; 

alpha=alphamin:astep:alphamax-astep; 

beta  1=100; 

% 

%   User  defines  number  of  samples  in  data  set 

% 

n=input( 'enter  number  of  samples  in  the  data  set    ') 

% 

%    Determine  the  set  of  expected  data,  based  on  expected  rank 

% 

for  k=l:n,fxi(k)=k/(n+l);,end 

xi=betal*(-log(l-fxi)).A(l/alphal); 

xi=sort(xi); 

qcensr=input('enter  0  for  scheduled  censor,  1  for  censor  based  on  failures  ') 

if  qcensr==0 

% 

%    Define  the  censor  locations  as  a  fraction  of  the  mean 

% 

tcensor=input('enter  censor  points  as  fraction  of  mean  life    ') 

lmax=length(tcensor); 
else 

xcb=input( 'enter  censor  points  as  fractional  realizations    ') 

lmax=length(xcb); 
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end 
% 

%    Initialize  variables  and  the  likelihood 
% 

count=l; 
inc=l; 
flag  1=0; 
flag2=0; 
L=zeros(50,50); 
while  count<=lmax 
if  qcensr==0 
% 

%    Determine  the  time  of  censor  and  use  this  value  to  define  which  points  are  exact 
% 

xc=tcensor(count)*betal 
index=max(find(xi<=xc)); 
xie=xi(l:index); 
c=length(xie) 
else 
% 

%    Determine  the  exact  data  and  the  number  of  exact  data 
% 

nreal=xcb(count)*(n); 
xie=xi(l:nreal); 
c=length(xie) 
if  c<=l 

xc=xie; 
else 

xc=max(xie); 
end 
end 

beta=5:5:250; 
iter=l; 

while  iter<=3 
% 

%   Compute  likelihood  (50x50  array)  for  censored  data 
% 

L=zeros(50); 
if  c~=0 
for  i=l:50 
for  j= 1:50 
a=alpha(i); 
b=beta(j); 
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xib=xie/b; 

Ll=((a/b)Ac)*prod((xib.A(a-l)).*exp(-(xib.Aa))); 
L2=(exp(-(xc/b).Aa))A(n-c); 
L(i,j)=Ll*L2; 
end 

end 

m=length(alpha); 

a=l; 

b=l; 
% 

%   Obtain  marginal  distributions  of  alpha  by  summing  along  the  betas 
% 

for  u=l :50,ma(u)=sum(L(u,:));,end 

if  iter==l,mal=ma';,end 

if  iter==2,ma2=ma,;,end 

if  iter==3,ma3=ma';,end 
% 

%    Normalize  alphas  for  a  common  integration  range  by  dividing  astep  by 
%    alpha  1 
% 

delta=astep/alphal 
% 

%    Compute  the  area  under  the  marginal  distribution  and  normalize  the  values 
% 

area(iter)=numintg(m,a,b,delta,ma) 

ma=ma/area(iter); 

% 

%   Compute  the  information  based  on  the  marginal  distribution  of  alpha 

% 

info=zeros(50,l); 
if  iter==l 
for  u= 1:50 
if  ma(u)~=0.0 

info(u)=info(u)+ma(u)*log(ma(u)); 
end 
end 

I(count)=sum(info); 
end 
%   This  test  is  to  determine  if  the  likelihood  is  required  to  be  computed  for  the  higher 
%     values   of  beta,  such  as  used  in  iter=2  and  3.    As  more  data  is  realized,  the 
%     likelihood  surface  narrows  to  the  region  where  beta=  5:250;  no  longer  requiring 
%    the  computation  for  high  values  of  beta.   The  test  compares  the  differences  in  the 
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%  area  from  iter=2  to  iter=l.    If  the  area  computed  in  iter=2  is  very  small  compared  % 

to  iter=l,  then  additional  computation  of  likelihood  in  that  range  of  betas. 

% 

if  (iter==2)&(flagl==0) 
diffa  1 2=area(2)/area(  1 ); 
if  diffa  1 2<  1  .e-6,flag  1  =  1  ,end 
end 

if  iter==3 
if  flag2==0 
diffa  1 3=area(3)/area(  1 ); 
if  diffa  13<l.e-6,flag2=l, end 
end 
% 

%    If  all  three  iterations  are  required  to  define  the  likelihood,  revise  the  previous  value 
%     of  I 
% 

info=zeros(50,l); 
mat=ma  1  +ma2+ma3; 
ma=mat'; 
% 

%   Compute  the  area  under  the  marginal  distribution  and  normalize  the  values 
% 

areat=numintg(m,a,b,delta,ma); 
ma=ma/areat; 
% 

%     Compute  the  information  for  this  censor  location 
% 

for  u=  1:50 
if  ma(u)~=0.0 

info(u)=info(u)+ma(u)*log(ma(u)); 
end 
end 

I(count)=sum(info); 
end 
I 
else 
% 

%   If  no  data  was  realized,  set  I  =  0 
% 

l(count)=0; 
I 
end 
% 
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%    Check  the  flags  to  determine  if  large  values  of  beta  are  required  for  the  next 

%     likelihood  calcuation 

% 

if  flag2==l,iter=3;,end 

if  (flagl==l)&(iter==2),iter=3;,end 

iter=iter+l 
% 

%    Define  the  regions  for  the  large  values  of  beta 
% 

if  iter==2,beta=250: 100:5 150;,end 

if  iter==3,beta=5 150:200: 14950;,end 
% 

%    Initialize  variables  for  next  iteration 
% 

ma=zeros(l,50); 

mat=zeros(50,l): 

L=zeros(50,50); 
end 
% 

%    Initialize  variables  for  next  iteration 
% 
count=count+l 
inc=inc+l; 
L=zeros(50,50); 
end 
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